Front cover
Introduction to Grid Computing
Learn grid computing basics
Understand architectural considerations
Create and demonstrate a grid environment
Bart Jacob Michael Brown Kentaro Fukui Nihar Trivedi
ibm.comredbooks
International Technical Support Organization
Introduction to Grid Computing
December 2005
SG24677800
Note: Before using this information, read the information in Notices on page ix.
First Edition December 2005
Copyright International Business Machines Corporation 2005. All rights reserved.
Note to U.S. Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Notices ……………………………………………….ix
Trademarks …………………………………………….x
Preface ……………………………………………….xi The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi Becomeapublishedauthor …………………………………xiii Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Part1. Gridfundamentals …………………………………………….1
Chapter1. WhatgridComputingis…………………………..3
Chapter2. Benefitsofgridcomputing ………………………..7 2.1 Exploitingunderutilizedresources………………………….8 2.2 ParallelCPUcapacity…………………………………..9 2.3 Virtual resources and virtual organizations for collaboration. . . . . . . . . . . 10 2.4 Accesstoadditionalresources……………………………11 2.5 Resourcebalancing……………………………………12 2.6 Reliability…………………………………………..14 2.7 Management………………………………………..15 2.8 Summary…………………………………………..17
Chapter3. Gridtermsandconcepts…………………………19 3.1 Typesofresources ……………………………………20 3.1.1 Computation…………………………………….20 3.1.2 Storage………………………………………..20 3.1.3 Communications………………………………….22 3.1.4 Softwareandlicenses………………………………22 3.1.5 Special equipment, capacities, architectures, and policies . . . . . . . . 23 3.2 Jobsandapplications ………………………………….23 3.3 Scheduling,reservation,andscavenging…………………….24 3.4 Gridsoftwarecomponents ………………………………26 3.4.1 Managementcomponents……………………………26 3.4.2 Distributedgridmanagement …………………………26 3.4.3 Donorsoftware …………………………………..27 3.4.4 Submissionsoftware……………………………….28 3.4.5 Schedulers ……………………………………..28 3.4.6 Communications………………………………….29 3.4.7 Observationandmeasurement………………………..29
Copyright IBM Corp. 2005. All rights reserved. iii
3.5 Intragridandintergrid ………………………………….30 3.6 Summary…………………………………………..32
Chapter4. Griduserroles ………………………………..33 4.1 Usingagrid:Ausersperspective………………………….34 4.1.1 Enrollingandinstallinggridsoftware…………………….34 4.1.2 Loggingontothegrid ………………………………34 4.1.3 Queriesandsubmittingjobs………………………….35 4.1.4 Dataconfiguration…………………………………36 4.1.5 Monitoringprogressandrecovery………………………36 4.1.6 Reservingresources……………………………….37 4.2 Usingagrid:Anadministratorsperspective…………………..38 4.2.1 Planning ……………………………………….38 4.2.2 Installation………………………………………39 4.2.3 Managingenrollmentofdonorsandusers ………………..39 4.2.4 Certificateauthority………………………………..40 4.2.5 Resourcemanagement……………………………..41 4.2.6 Datasharing…………………………………….41 4.3 Summary…………………………………………..42
Part 2. Gridarchitectureconsiderations………………………………….43
Chapter5. Standardsforgridenvironments …………………..45 5.1 Overview…………………………………………..46 5.1.1 OGSA…………………………………………46 5.1.2 OGSI………………………………………….47 5.1.3 OGSADAI………………………………………47 5.1.4 GridFTP………………………………………..48 5.1.5 WSRF…………………………………………48 5.1.6 Webservicesrelatedstandards ……………………….49
Chapter6. Applicationconsiderations……………………….51 6.1 Generalapplicationconsiderations…………………………52 6.2 CPUintensiveapplicationconsiderations…………………….53 6.3 Dataconsiderations……………………………………59 6.4 Summary…………………………………………..62
Chapter7. Security……………………………………..63 7.1 Introductiontogridsecurity………………………………64 7.1.1 Gridsecurityrequirements …………………………..64 7.1.2 Securityfundamentals………………………………67 7.1.3 Importantgridsecurityterms………………………….68 7.1.4 Symmetrickeyencryption……………………………69 7.1.5 Asymmetrickeyencryption…………………………..70
iv Introduction to Grid Computing
7.1.6 TheCertificateAuthority…………………………….71
7.1.7 Digitalcertificates …………………………………73 7.2 Gridsecurityinfrastructure ………………………………76 7.2.1 Gettingaccesstothegrid……………………………76 7.2.2 Gridsecurecommunication ………………………….82 7.2.3 Gridsecuritystepbystep……………………………84 7.3 Gridinfrastructuresecurity ………………………………88 7.3.1 Physicalsecurity………………………………….88 7.3.2 Operatingsystemsecurity……………………………88 7.3.3 Gridandfirewalls …………………………………89 7.3.4 Hostintrusiondetection……………………………..89 7.4 PKIsecuritypoliciesandprocedures ……………………….90 7.4.1 CertificateAuthority………………………………..90 7.4.2 Securitycontrolsreview …………………………….92 7.5 Summary…………………………………………..93
Chapter8. Design………………………………………95 8.1 Buildingagridarchitecture ………………………………96 8.1.1 Solutionobjectives ………………………………..97 8.2 Gridarchitecturemodels……………………………….101 8.2.1 Computationalgrid ……………………………….101 8.2.2 Datagrid………………………………………102 8.3 Gridtopologies ……………………………………..103 8.3.1 Intragrid……………………………………….104 8.3.2 Extragrid………………………………………105 8.3.3 Intergrid……………………………………….106 8.3.4 eUtilities………………………………………107 8.4 Phasesandactivities………………………………….108 8.4.1 Basicmethodology ……………………………….108 8.4.2 Recommendedsteps ……………………………..109 8.5 Aconceptualarchitecture ………………………………111 8.5.1 Infrastructure……………………………………111 8.6 Summary………………………………………….113
Chapter9. Webservicesresourceframework…………………115 9.1 ResourcestatemanagementusingGridservices………………117 9.1.1 WhataGridserviceis……………………………..117 9.1.2 Gridservicesvs.Webservices……………………….118 9.1.3 OGSAGridservicerequirements……………………..119 9.1.4 Open Grid Services Interface OGSI Grid services . . . . . . . . . . . . 120 9.1.5 OGSItoWSRFrefactoring………………………….122 9.2 WSRFfundamentals………………………………….124 9.2.1 WhataWSResourceis ……………………………124
Contents v
9.2.2 Impliedresourcepatternforstatefulresources…………….126 9.3 WSResourceFrameworkspecifications ……………………130 9.3.1 WSResourceFrameworkandGlobusToolkit4……………135 9.4 WSRFreferences ……………………………………137 9.5 Summary………………………………………….137
Part 3. CreatingagridenvironmentwiththeGlobusToolkit4…………………139
Chapter10. GlobusToolkit4components……………………141 10.1 OverviewofGlobusToolkit4 …………………………..142 10.2 Commonruntimecomponents ………………………….143
10.2.1 JavaWSCore………………………………….143 10.2.2 CWSCore ……………………………………144 10.2.3 PythonWSCore………………………………..144
10.3 Securitycomponents…………………………………145 10.3.1 WSauthenticationandauthorization ………………….145 10.3.2 PreWSauthenticationandauthorization……………….145 10.3.3 CommunityAuthorizationServiceCAS ……………….145 10.3.4 Delegationservice……………………………….145 10.3.5 SimpleCA …………………………………….146 10.3.6 MyProxy ……………………………………..146 10.3.7 GSIOpenSSH………………………………….146
10.4 Datamanagementcomponents …………………………147 10.4.1 GridFTP………………………………………147 10.4.2 ReliableFileTransferRFT………………………..148 10.4.3 ReplicaLocationServiceRLS ……………………..148 10.4.4 OGSADAI…………………………………….149 10.4.5 DataReplicationServiceDRS ……………………..149
10.5 MonitoringandDiscoveryServices……………………….149 10.5.1 Indexservice…………………………………..149 10.5.2 Triggerservice …………………………………150 10.5.3 AggregatorFramework……………………………151 10.5.4 WebMDS……………………………………..152
10.6 Executionmanagement……………………………….152 10.6.1 WSGRAM…………………………………….152 10.6.2 CommunitySchedulerFramework4CSF4 …………….153 10.6.3 Globus Teleoperations Control Protocol GTCP . . . . . . . . . . . . . 154 10.6.4 WorkspaceManagementServiceWMS……………….154
10.7 Summary…………………………………………154
Chapter 11. Globus Toolkit 4 installation and configuration. . . . . . . . . . 155 11.1 HowtoobtainGlobusToolkit4………………………….156 11.2 PackagesofGlobusToolkit4…………………………..156
11.2.1 Binarypackages………………………………..157 vi Introduction to Grid Computing
11.2.2 Sourcepackages ……………………………….158 11.3 Gridenvironment……………………………………158 11.4 Installation………………………………………..160
11.4.1 Installing required software for Globus Toolkit 4 installation . . . . . 160 11.4.2 PreparingtheOSforGlobusToolkit4installation …………163 11.4.3 InstallingGlobusToolkit4………………………….165
11.5 Configurationandtestingofgridenvironment………………..167 11.5.1 Configuringenvironmentalvariables…………………..168 11.5.2 Securitysetup …………………………………168 11.5.3 ConfigurationofJavaWSCore………………………174 11.5.4 ConfigurationandtestingofGridFTP ………………….177 11.5.5 ConfigurationandtestingofRFT …………………….180 11.5.6 ConfigurationandtestingofWSGRAM………………..185 11.5.7 TestingofMDS4………………………………..191
11.6 Uninstallation………………………………………192 11.7 Summary…………………………………………193
Part 4. Griddemonstrationapplication………………………………….195
Chapter12. Demonstrationapplication ……………………..197 12.1 RenderClient………………………………………200 12.1.1 TheGraphicalUserInterfaceGUI …………………..200 12.1.2 RenderClientsourcecode………………………….209 12.2 RenderWorker……………………………………..211 12.3 RenderSourceService………………………………..212 12.3.1 Alternativearchitecture……………………………213 12.4 DirectoryTreeofimportantfilesindemo……………………213
Part5. Appendixes………………………………………………..221
AppendixA. IBMsoftwareportfolioforgridcomputing………….223 IBMApplicationWorkloadModeler …………………………..224 IBM CloudscapeApache Derby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 DB2ConnectFamily …………………………………….224 DB2EveryplaceFamily …………………………………..224 DB2UniversalDatabaseFamily …………………………….224 Mathematical Acceleration Subsystem MASS. . . . . . . . . . . . . . . . . . . . . . . 224 Rational Application Developer for WebSphere Software . . . . . . . . . . . . . . . 225 IBMTivoliAccessManagerFamily …………………………..225 IBMTivoliConfigurationManager ……………………………225 IBMTivoliEnterpriseConsole ………………………………225 IBMTivoliIntelligentOrchestrator ……………………………226 IBM Tivoli License Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 The IBM Tivoli Management Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Contents vii
IBMTivoliMonitoringforVirtualServers ……………………….226 IBM Tivoli OMEGAMON XE Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 IBMTivoliProvisioningManager …………………………….227 IBM Tivoli System Automation for Multiplatforms. . . . . . . . . . . . . . . . . . . . . . 227 IBM Tivoli Universal Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 WebSphereApplicationServer ……………………………..228 WebSphereApplicationServerNetworkDeployment ………………228 WebSphereExtendedDeployment …………………………..228 IBMWebSphereMQ …………………………………….228 WebSphere Studio Application Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 IBM Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 IBMRemoteDeploymentManager …………………………..229 IBM ServerGuide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 IBMVirtualMachineManager ………………………………229 Cluster Systems Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Parallel ESSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 LoadLeveler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 GeneralParallelFileSystem ……………………………….230
AppendixB. Additionalmaterial …………………………..231 Locating the Web material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 UsingtheWebmaterial …………………………………..232
System requirements for downloading the Web material . . . . . . . . . . . . . 232 HowtousetheWebmaterial …………………………….232
Relatedpublications ……………………………………235 IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Otherpublications ………………………………………235 Onlineresources ……………………………………….237 How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 HelpfromIBM …………………………………………239
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
viii Introduction to Grid Computing
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the users responsibility to evaluate and verify the operation of any nonIBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 105041785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements andor changes in the products andor the programs described in this publication at any time without notice.
Any references in this information to nonIBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.
Information concerning nonIBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to nonIBM products. Questions on the capabilities of nonIBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBMs application programming interfaces.
Copyright IBM Corp. 2005. All rights reserved. ix
Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
AFS
AIX 5LTM
AIX
CloudscapeTM
DB2 ConnectTM
DB2 Universal DatabaseTM DB2
developerWorks
DFSTM
Domino
Eserver
eServerTM
The following terms are trademarks of
Everyplace ibm.com
IBM
iSeriesTM LoadLeveler Lotus OMEGAMON PerformTM pSeries Rational Redbooks logo RedbooksTM
other companies:
RS6000
ServerGuideTM
Summit
Tivoli Enterprise Console Tivoli EnterpriseTM
Tivoli
WebSphere
World Community GridTM xSeries
zSeries
Java and all Javabased trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
Intel, Intel Inside logos, MMX, and Pentium are trademarks of Intel Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, and service names may be trademarks or service marks of others.
x Introduction to Grid Computing
TM
Preface
In the past several years, grid computing has emerged as a way to harness and take advantage of computing resources across geographies and organizations. In this IBM Redbook, we describe a generalized view of grid computing including concepts, standards, and ways in which grid computing can provide business value to your organization. In a nutshell, grid computing is all about virtualization that enables businesses to take advantage of a variety of IT resources in order to be more responsive to demands of the business and increase availability of applications while reducing both infrastructure and management costs.
There are many products and toolkits available from IBM and other companies that enable different aspects of grid computing. One of the most well known toolkits is the Globus Toolkit. Globus Toolkit 4 provides components and services conforming to existing and evolving standards that can be used as the basis for a grid computing solution. In the second half of this book we provide instructions for installing and configuring a simple Globus environment that can be used to demonstrate various aspects of grid computing and to build a proofofconcept environment. We also describe, and provide as additional material, a sample grid application that can be used to demonstrate, test, and instruct about the grid computing concepts introduced in this book.
The team that wrote this redbook
This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, Austin Center.
Bart Jacob is a Senior Consulting IT Specialist at IBM Corp International Technical Support Organization, Austin Center. He has over 25 years of experience providing technical support across a variety of IBM products and technologies, including communications, objectoriented software development, and systems management. He has over 14 years of experience at the ITSO, where he has been writing IBM RedbooksTM and creating and teaching workshops around the world on a variety of topics. He holds a Masters degree in Numerical Analysis from Syracuse University.
Michael Brown is the Technical Project Leader for the Americas and Asia Pacific sites of IBMs Linux Integration Center, headquartered in Austin, Texas. He leads teams that perform technical support for customers who are evaluating IBM software running on Linux platforms. He is a certified JavaTM Programmer, Developer, and Architect and has worked on several previous IBM grid redbooks
Copyright IBM Corp. 2005. All rights reserved. xi
and presented on grid computing at the Colorado Software Summit. He holds HBSc and MSc degrees in Computer Science from the University of Western Ontario, Canada.
Kentaro Fukui is an IT Specialist for IBM and a Red Hat Certified Engineer working in IBM Global Services, Japan. He has more than two years of experience with grid technologies as well as more than eight years of experience with UNIXlike operating systems, Windows servers, and Lotus Domino servers. He holds a MSc Degree in Information and Computer Science from Keio University, Japan. Currently, he is also working as a PhD candidate student at Keio University. He received the IEEE Computer Society Best Paper Award in 2004.
Nihar Trivedi is a Consultant and a IBM Grid Technical Sales certified professional working for IBM Business Consulting Services in Australia. Nihar has more than eight years of experience in delivering complex ebusiness applications in Financial Services, Utility, Government, and Telecommunication industries. Nihar is a PhD student affiliated with the University of Sydney and National ICT Australia. Nihars main research interests include selfadaptive middleware systems and grid computing.
Thanks to the following people for their contributions to this project: Sean Slevin
Suguru Hamazaki
System Design Center West, Business Infrastructure Solution, IBM Japan Systems Engineering
Julie Czubik
International Technical Support Organization, Poughkeepsie Center
The team that created a predecessor redbook Introduction to Grid Computing with Globus, SG246895 from which we have reused a wide range of material:
Luis Ferreira
Viktors Berstis Jonathan Armstrong Mike Kendzierski Andreas Neukoetter Masanobu Takagi Richard BingWo Adeeb Amir
Ryo Murakawa Olegario Hernandez James Magowan
xii Introduction to Grid Computing
Norbert Bieberstein
Become a published author
Join us for a two to sixweek residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting handson experience with leadingedge technologies. Youll team with IBM technical professionals, Business Partners andor customers.
Your efforts will help increase product acceptance and customer satisfaction. As a bonus, youll develop a network of contacts in IBM development labs, and increase your productivity and marketability.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.comredbooksresidencies.html
Comments welcome
Your comments are important to us!
We want our Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways:
Use the online Contact us review redbook form found at: ibm.comredbooks
Send your comments in an email to: redbookus.ibm.com
Mail your comments to:
IBM Corporation, International Technical Support Organization Dept. JN9B Building 905
11501 Burnet Road
Austin, Texas 787583493
Preface xiii
xiv Introduction to Grid Computing
Grid fundamentals
Part 1
Part 1
Copyright IBM Corp. 2005. All rights reserved. 1
2 Introduction to Grid Computing
Chapter 1.
What grid Computing is
Grid computing can mean different things to different individuals. The grand vision is often presented as an analogy to power grids where users or electrical appliances get access to electricity through wall sockets with no care or consideration for where or how the electricity is actually generated. In this view of grid computing, computing becomes pervasive and individual users or client applications gain access to computing resources processors, storage, data, applications, and so on as needed with little or no knowledge of where those resources are located or what the underlying technologies, hardware, operating system, and so on are.
Though this vision of grid computing can capture ones imagination and may indeed someday become a reality, there are many technical, business, political, and social issues that need to be addressed. If we consider this vision as an ultimate goal, there are many smaller steps that need to be taken to achieve it. These smaller steps each have benefits of their own.
Therefore, grid computing can be seen as a journey along a path of integrating various technologies and solutions that move us closer to the final goal. Its key values are in the underlying distributed computing infrastructure technologies that are evolving in support of crossorganizational application and resource sharingin a word, virtualizationvirtualization across technologies, platforms, and organizations. This kind of virtualization is only achievable through the use of open standards. Open standards help ensure that applications can transparently take advantage of whatever appropriate resources can be made available to
Copyright IBM Corp. 2005. All rights reserved. 3
1
them. An environment that provides the ability to share and transparently access resources across a distributed and heterogeneous environment not only requires the technology to virtualize certain resources, but also technologies and standards in the areas of scheduling, security, accounting, systems management, and so on.
Grid computing could be defined as any of a variety of levels of virtualization along a continuum. Exactly where along that continuum one might say that a particular solution is an implementation of grid computing versus a relatively simple implementation using virtual resources is a matter of opinion. But even at the simplest levels of virtualization, one could say that gridenabling technologies are being utilized.
This continuum is illustrated in Figure 11 on page 5. Starting in the lower left you see single system partitioning. Virtualization starts with being able to carve up a machine into virtual machines. As you move up this spectrum you start to be able to virtualize similar or homogeneous resources. Virtualization applies not only to servers and CPUs, but to storage, networks, and even applications. As you move up this spectrum you start to virtualize unlike resources. The next step is virtualizing the enterprise, not just in a data center or within a department but across a distributed organization, and then, finally, virtualizing outside the enterprise, across the Internet, where you might actually access resources from a set of OEMs and their suppliers or you might integrate information across a network of collaborators.
4 Introduction to Grid Computing
Figure 11 Virtualization continuum
Early implementations of grid computing have tended to be internal to a particular company or organization. However, crossorganizational grids are also being implemented and will be an important part of computing and business optimization in the future.
The distinctions between intraorganizational grids and interorganizational grids are not based in technological differences. Instead, they are based on configuration choices given: Security domains, degrees of isolation desired, type of policies and their scope, and contractual obligations between users and providers of the infrastructures. These issues are not fundamentally architectural in nature. It is in the industrys best interest to ensure that there is not an artificial split of distributed computing paradigms and models across organizational boundaries and internal IT infrastructures.
Grid computing involves an evolving set of open standards for Web services and interfaces that make services, or computing resources, available over the Internet.
Very often grid technologies are used on homogeneous clusters, and they can add value on those clusters by assisting, for example, with scheduling or provisioning of the resources in the cluster. The term grid, and its related technologies, applies across this entire spectrum.
Chapter 1. What grid Computing is 5
If we focus our attention on distributed computing solutions, then we could consider one definition of grid computing to be distributed computing across virtualized resources. The goal is to create the illusion of a simple yet large and powerful virtual computer out of a collection of connected and possibly heterogeneous systems sharing various combinations of resources.
6 Introduction to Grid Computing
Chapter 2.
Benefits of grid computing
When you deploy a grid, it will be to meet a set of business requirements. To better match grid computing capabilities to those requirements, it is useful to keep in mind some common motivations for using grid computing.
Copyright IBM Corp. 2005. All rights reserved. 7
2
2.1 Exploiting under utilized resources
One of the basic uses of grid computing is to run an existing application on a different machine. The machine on which the application is normally run might be unusually busy due to a peak in activity. The job in question could be run on an idle machine elsewhere on the grid.
There are at least two prerequisites for this scenario. First, the application must be executable remotely and without undue overhead. Second, the remote machine must meet any special hardware, software, or resource requirements imposed by the application. For example, a batch job that spends a significant amount of time processing a set of input data to produce an output data set is perhaps the most ideal and simple use case for a grid. If the quantities of input and output are large, more thought and planning might be required to efficiently use the grid for such a job. It would usually not make sense to use a word processor remotely on a grid because there would probably be greater delays and more potential points of failure.
In most organizations, there are large amounts of under utilized computing resources. Most desktop machines are busy less than 5 percent of the time over a business day. In some organizations, even the server machines can often be relatively idle. Grid computing provides a framework for exploiting these under utilized resources and thus has the possibility of substantially increasing the efficiency of resource usage.
The processing resources are not the only ones that may be under utilized. Often, machines may have enormous unused disk drive capacity. Grid computing more specifically, a data grid can be used to aggregate this unused storage into a much larger virtual data store, possibly configured to achieve improved performance and reliability over that of any single machine.
If a batch job needs to read a large amount of data, this data could be automatically replicated at various strategic points in the grid. Thus, if the job must be executed on a remote machine in the grid, the data is already there and does not need to be moved to that remote point. This offers clear performance benefits. Also, such copies of data can be used as backups when the primary copies are damaged or unavailable.
Another benefit of a grid is to better balance resource utilization. An organization may have occasional unexpected peaks of activity that demand more resources. If the applications are gridenabled, they can be moved to under utilized machines during such peaks. In fact, some grid implementations can migrate partially completed jobs. In general, a grid can provide a consistent way to balance the loads on a wider federation of resources. This applies to CPU, storage, and any other types of resources that may be available on a grid.
8 Introduction to Grid Computing
2.2 Parallel CPU capacity
The potential for massive parallel CPU capacity is one of the most common visions and attractive features of a grid. In addition to pure scientific needs, such computing power is driving a new evolution in industries such as the biomedical field, financial modeling, oil exploration, motion picture animation, and many others.
The common attribute among such uses is that the applications have been written to use algorithms that can be partitioned into independently running parts. A CPUintensive grid application can be thought of as many smaller subjobs, each executing on a different machine in the grid. To the extent that these subjobs do not need to communicate with each other, the more scalable the application becomes. A perfectly scalable application will, for example, finish in one tenth of the time if it uses ten times the number of processors.
Barriers often exist to perfect scalability. The first barrier depends on the algorithms used for splitting the application among many CPUs. If the algorithm can only be split into a limited number of independently running parts, then that forms a scalability barrier. The second barrier appears if the parts are not completely independent; this can cause contention, which can limit scalability. For example, if all of the subjobs need to read and write from one common file or database, the access limits of that file or database will become the limiting factor in the applications scalability. Other sources of interjob contention in a parallel grid application include message communications latencies among the jobs, network communication capacities, synchronization protocols, inputoutput bandwidth to storage or other devices, and other delays interfering with realtime requirements.
There are many factors to consider in gridenabling an application. One must understand that not all applications can be transformed to run in parallel on a grid and achieve scalability. Furthermore, there are no practical tools for transforming arbitrary applications to exploit the parallel capabilities of a grid. There are some practical tools that skilled application designers can use to write a parallel grid application. However, automatic transformation of applications is a science in its infancy. This can be a difficult job and often requires mathematics and programming talents, if it is even possible in a given situation. New computationintensive applications written today are being designed for parallel execution, and these will be easily gridenabled, if they do not already follow emerging grid protocols and standards.
Chapter 2. Benefits of grid computing 9
2.3 Virtual resources and virtual organizations for collaboration
Another capability enabled by grid computing is to provide an environment for collaboration among a wider audience. In the past, distributed computing promised this collaboration and achieved it to some extent. Grid computing can take these capabilities to an even wider audience, while offering important standards that enable very heterogeneous systems to work together to form the image of a large virtual computing system offering a variety of resources, as illustrated in Figure 21 on page 11. The users of the grid can be organized dynamically into a number of virtual organizations, each with different policy requirements. These virtual organizations can share their resources collectively as a larger grid.
Sharing starts with data in the form of files or databases. A data grid can expand data capabilities in several ways. First, files or databases can span many systems and thus have larger capacities than on any single system. Such spanning can improve data transfer rates through the use of striping techniques. Data can be duplicated throughout the grid to serve as a backup and can be hosted on or near the machines most likely to need the data, in conjunction with advanced scheduling techniques.
Sharing is not limited to files, but also includes other resources, such as specialized devices, software, services, licenses, and so on. These resources are virtualized to give them a more uniform interoperability among heterogeneous grid participants.
The participants and users of the grid can be members of several real and virtual organizations. The grid can help in enforcing security rules among them and implement policies, which can resolve priorities for both resources and users.
10 Introduction to Grid Computing
Figure 21 The grid virtualizes heterogeneous, geographically disperse resources
2.4 Access to additional resources
As already stated, in addition to CPU and storage resources, a grid can provide access to other resources as well. The additional resources can be provided in additional numbers andor capacity. For example, if a user needs to increase their total bandwidth to the Internet to implement a data mining search engine, the work can be split among grid machines that have independent connections to the Internet. In this way, total searching capability is multiplied, since each machine has a separate connection to the Internet. If the machines had shared the connection to the Internet, there would not have been an effective increase in bandwidth.
Some machines may have expensive licensed software installed that users require. Users jobs can be sent to such machines, more fully exploiting the software licenses.
Chapter 2. Benefits of grid computing 11
Some machines on the grid may have special devices. Most of us have used remote printers, perhaps with advanced color capabilities or faster speeds. Similarly, a grid can be used to make use of other special equipment. For example, a machine may have a high speed, selffeeding DVD writer that could be used to publish a quantity of data faster. Some machines on the grid may be connected to scanning electron microscopes that can be operated remotely. In this case, scheduling and reservation are important. A specimen could be sent in advance to the facility hosting the microscope. Then the user can remotely operate the machine, changing perspective views until the desired image is captured.
The grid can enable more elaborate access, potentially to remote medical diagnostic and robotic surgery tools with twoway interaction from a distance. The variations are limited only by ones imagination. Today, we have remote device drivers for printers. Eventually, we will see standards for gridenabled device drivers to many unusual devices and resources. All of these will make the grid look like a large system with a collection of resources beyond what would be available on just one conventional machine.
2.5 Resource balancing
A grid federates a large number of resources contributed by individual machines into a large singlesystem image. For applications that are gridenabled, the grid can offer a resource balancing effect by scheduling grid jobs on machines with low utilization, as illustrated in Figure 22 on page 13. This feature can prove invaluable for handling occasional peak loads of activity in parts of a larger organization. This can happen in two ways:
An unexpected peak can be routed to relatively idle machines in the grid.
If the grid is already fully utilized, the lowest priority work being performed on the grid can be temporarily suspended or even cancelled and performed again later to make room for the higher priority work.
Without a grid infrastructure, such balancing decisions are difficult to prioritize and execute.
Occasionally, a project may suddenly rise in importance with a specific deadline. A grid cannot perform a miracle and achieve a deadline when it is already too close. However, if the size of the job is known, if it is a kind of job that can be sufficiently split into subjobs, and if enough resources are available after preempting lower priority work, a grid can bring a very large amount of processing power to solve the problem.
12 Introduction to Grid Computing
Figure 22 Jobs are migrated to less busy parts of the grid to balance loads
Other more subtle benefits can occur using a grid for load balancing. When jobs communicate with each other, the Internet, or with storage resources, an advanced scheduler could schedule them to minimize communications traffic or minimize the distance of the communications. This can potentially reduce communication and other forms of contention in the grid.
Finally, a grid provides excellent infrastructure for brokering resources. Individual resources can be profiled to determine their availability and their capacity, and this can be factored into scheduling on the grid. Depending on the accounting facilities in place, different organizations participating in the grid can build up grid credits and use them at times when they need additional resources. This can form the basis for grid accounting and the ability to more fairly distribute work and cost on the grid.
Chapter 2. Benefits of grid computing 13
2.6 Reliability
Highend conventional computing systems use expensive hardware to increase reliability. They are built using chips with redundant circuits that vote on results, and contain logic to achieve graceful recovery from an assortment of hardware failures. The machines also use duplicate processors with hot pluggability so that when they fail, one can be replaced without turning the other off. Power supplies and cooling systems are duplicated. The systems are operated on special power sources that can start generators if utility power is interrupted. All of this builds a reliable system, but at a great cost, due to the duplication of expensive components.
In the future, we will see a complementary approach to reliability that relies on software and hardware. A grid is just the beginning of such technology. The systems in a grid can be relatively inexpensive and geographically dispersed. Thus, if there is a power or other kind of failure at one location, the other parts of the grid are not likely to be affected. Grid management software can automatically resubmit jobs to other machines on the grid when a failure is detected. In critical, realtime situations, multiple copies of important jobs can be run on different machines throughout the grid, as illustrated in Figure 23 on page 15. Their results can be checked for any kind of inconsistency, such as computer failures, data corruption, or tampering.
14 Introduction to Grid Computing
Job x
Job x
Job x
Figure 23 Redundant grid configuration
Such grid systems will utilize autonomic computing. This is a type of software that automatically heals problems in the grid, perhaps even before an operator or manager is aware of them. In principle, most of the reliability attributes achieved using hardware in todays high availability systems can be achieved using software in a grid setting in the future.
2.7 Management
The goal to virtualize the resources on the grid and more uniformly handle heterogeneous systems will create new opportunities to better manage a larger, more distributed IT infrastructure. It will be easier to visualize capacity and utilization, making it easier for IT departments to control expenditures for computing resources over a larger organization.
The grid offers management of priorities among different projects. In the past, each project may have been responsible for its own IT resources and the associated expenses. Often these resources might be under utilized while
Chapter 2. Benefits of grid computing 15
another project finds itself in trouble, needing more resources due to unexpected events. With the larger view a grid can offer, it becomes easier to control and manage such situations. As illustrated in Figure 24, administrators can change any number of policies that affect how the different organizations might share or compete for resources.
Aggregating utilization data over a larger set of projects can enhance an organizations ability to project future upgrade needs. When maintenance is required, grid work can be rerouted to other machines without crippling the projects involved.
Autonomic computing can come into play here too. Various tools may be able to identify important trends throughout the grid, informing management of those that require attention.
Figure 24 Administrators can adjust policies to better allocate resources
16 Introduction to Grid Computing
2.8 Summary
Grid computing enables organizations real and virtual to take advantage of various computing resources in ways not previously possible. They can take advantage of under utilized resources to meet business requirements while minimizing additional costs. The nature of a computing grid allows organizations to take advantage of parallel processing, making many applications financially feasible as well as allowing them to complete sooner.
Grid computing makes more resources available to more people and organizations while allowing those responsible for the IT infrastructure to enhance resource balancing, reliability, and manageability.
Chapter 2. Benefits of grid computing 17
18 Introduction to Grid Computing
Chapter 3.
Grid terms and concepts
In this chapter we introduce a few key grid terms and concepts that we use throughout this book.
Copyright IBM Corp. 2005. All rights reserved. 19
3
3.1 Types of resources
A grid is a collection of machines, sometimes referred to as nodes, resources, members, donors, clients, hosts, engines, and many other such terms. They all contribute any combination of resources to the grid as a whole. Some resources may be used by all users of the grid, while others may have specific restrictions.
3.1.1 Computation
The most common resource is computing cycles provided by the processors of the machines on the grid. The processors can vary in speed, architecture, software platform, and other associated factors, such as memory, storage, and connectivity. There are three primary ways to exploit the computation resources of a grid.
The first and simplest is to use it to run an existing application on an available machine on the grid rather than locally.
The second is to use an application designed to split its work in such a way that the separate parts can execute in parallel on different processors.
The third is to run an application, that needs to be executed many times, on many different machines in the grid. Scalability is a measure of how efficiently the multiple processors on a grid are used. If twice as many processors makes an application complete in one half the time, then it is said to be perfectly scalable. However, there may be limits to scalability when applications can only be split into a limited number of separately running parts or if those parts experience some other interdependencies such as contention for resources of some kind.
3.1.2 Storage
The second most common resource used in a grid is data storage. A grid providing an integrated view of data storage is sometimes called a data grid. Each machine on the grid usually provides some quantity of storage for grid use, even if temporary. Storage can be memory attached to the processor or it can be secondary storage, using hard disk drives or other permanent storage media. Memory attached to a processor usually has very fast access but is volatile. It would best be used to cache data or to serve as temporary storage for running applications.
Secondary storage in a grid can be used in interesting ways to increase capacity, performance, sharing, and reliability of data. Many grid systems use mountable networked file systems, such as Andrew File System AFS, Network File
20 Introduction to Grid Computing
System NFS, Distributed File System DFSTM, or General Parallel File System GPFS. These offer varying degrees of performance, security features, and reliability features.
Capacity can be increased by using the storage on multiple machines with a unifying file system. Any individual file or database can span several storage devices and machines, eliminating maximum size restrictions often imposed by file systems shipped with operating systems. A unifying file system can also provide a single uniform name space for grid storage. This makes it easier for users to reference data residing in the grid, without regard for its exact location. In a similar way, special database software can federate an assortment of individual databases and files to form a larger, more comprehensive database, accessible using database query functions.
Record
Record
Record
Record
Record
Record
High speed data
Virtualization Capacity
Sharing Availability
Striping speed Mirrors reliability Replicas remote Journals transactions
Striped virtual file system
Mirrors, Replicas, Journals…
Figure 31 Data striping
More advanced file systems on a grid can automatically duplicate sets of data, to provide redundancy for increased reliability and increased performance. An intelligent grid scheduler can help select the appropriate storage devices to hold data, based on usage patterns. Then jobs can be scheduled closer to the data, preferably on the machines directly connected to the storage devices holding the required data.
Data striping can also be implemented by grid file systems, as illustrated in Figure 31. When there are sequential or predictable access patterns to data, this technique can create the virtual effect of having storage devices that can
Chapter 3. Grid terms and concepts 21
transfer data at a faster rate than any individual disk drive. This can be important for multimedia data streams or when collecting large quantities of data at extremely high rates from CAT scans or particle physics experiments, for example.
A grid file system can also implement journaling so that data can be recovered more reliably after certain kinds of failures. In addition, some file systems implement advanced synchronization mechanisms to reduce contention when data is shared and updated by many users.
3.1.3 Communications
The rapid growth in communication capacity among machines today makes grid computing practical, compared to the limited bandwidth available when distributed computing was first emerging. Therefore, it should not be a surprise that another important resource of a grid is data communication capacity. This includes communications within the grid and external to the grid. Communications within the grid are important for sending jobs and their required data to points within the grid. Some jobs require a large amount of data to be processed, and it may not always reside on the machine running the job. The bandwidth available for such communications can often be a critical resource that can limit utilization of the grid.
External communication access to the Internet, for example, can be valuable when building search engines. Machines on the grid may have connections to the external Internet in addition to the connectivity among the grid machines. When these connections do not share the same communication path, then they add to the total available bandwidth for accessing the Internet.
Redundant communication paths are sometimes needed to better handle potential network failures and excessive data traffic. In some cases, higher speed networks must be provided to meet the demands of jobs transferring larger amounts of data. A grid management system can better show the topology of the grid and highlight the communication bottlenecks. This information can in turn be used to plan for hardware upgrades.
3.1.4 Software and licenses
The grid may have software installed that may be too expensive to install on every grid machine. Using a grid, the jobs requiring this software are sent to the particular machines on which this software happens to be installed. When the licensing fees are significant, this approach can save significant expenses for an organization.
22 Introduction to Grid Computing
Some software licensing arrangements permit the software to be installed on all of the machines of a grid but may limit the number of installations that can be simultaneously used at any given instant. License management software keeps track of how many concurrent copies of the software are being used and prevents more than that number from executing at any given time. The grid job schedulers can be configured to take software licenses into account, optionally balancing them against other priorities or policies.
3.1.5 Special equipment, capacities, architectures, and policies
Platforms on the grid will often have different architectures, operating systems, devices, capacities, and equipment. Each of these items represents a different kind of resource that the grid can use as criteria for assigning jobs to machines. While some software may be available on several architectures, for example, PowerPC and x86, such software is often designed to run only on a particular type of hardware and operating system. Such attributes must be considered when assigning jobs to resources in the grid.
In some cases, the administrator of a grid may create a new artificial resource type that is used by schedulers to assign work according to policy rules or other constraints. For example, some machines may be designated to only be used for medical research. These would be identified as having a medical research attribute and the scheduler could be configured to only assign jobs that require machines of the medical research resource. Others may participate in the grid only if they are not used for military purposes. In this situation, jobs requiring a military resource would not be assigned to such machines. Of course, the administrators would need to impose a classification on each kind of job through some certification procedure to use this kind of approach.
3.2 Jobs and applications
Although various kinds of resources on the grid may be shared and used, they are usually accessed via an executing application or job. Usually we use the term application as the highest level of a piece of work on the grid. However, sometimes the term job is used equivalently. Applications may be broken down into any number of individual jobs, as illustrated in Figure 32 on page 24. Those, in turn, can be further broken down into subjobs. The grid industry uses other terms, such as transaction, work unit, or submission, to mean the same thing as a job.
Jobs are programs that are executed at an appropriate point on the grid. They may compute something, execute one or more system commands, move or collect data, or operate machinery. A grid application that is organized as a
Chapter 3. Grid terms and concepts 23
collection of jobs is usually designed to have these jobs execute in parallel on different machines in the grid.
Figure 32 An application is one or more jobs that are scheduled to run on grid
The jobs may have specific dependencies that may prevent them from executing in parallel in all cases. For example, they may require some specific input data that must be copied to the machine on which the job is to run. Some jobs may require the output produced by certain other jobs and cannot be executed until those prerequisite jobs have completed executing. Jobs may spawn additional subjobs, depending on the data they process. This work flow can create a hierarchy of jobs and subjobs. Finally, the results of all of the jobs must be collected and appropriately assembled to produce the ultimate outputresult for the application.
3.3 Scheduling, reservation, and scavenging
The grid system is responsible for sending a job to a given machine to be executed. In the simplest of grid systems, the user may select a machine suitable for running his job and then execute a grid command that sends the job to the
24 Introduction to Grid Computing
selected machine. More advanced grid systems would include a job scheduler of some kind that automatically finds the most appropriate machine on which to run any given job that is waiting to be executed. Schedulers react to current availability of resources on the grid. The term scheduling is not to be confused with reservation of resources in advance to improve the quality of service. Sometimes the term resource broker is used in place of scheduler, but this term implies that some sort of bartering capability is factored into scheduling.
In a scavenging grid system, any machine that becomes idle would typically report its idle status to the grid management node. This management node would assign to this idle machine the next job whose requirements are satisfied by the machines resources. Scavenging is usually implemented in a way that is unobtrusive to the normal machine user. If the machine becomes busy with local nongrid work, the grid job is usually suspended or delayed. This situation creates somewhat unpredictable completion times for grid jobs, although it is not disruptive to those machines donating resources to the grid.
Grid applications that run in scavenging mode often mark themselves at the operating systems lowest priority level. In this way, they only run when no other work is pending. Due to the performance of modern day processors and operating system scheduling algor, the grid application can run for as short as a few milliseconds, even between a users keystrokes.
To create more predictable behavior, grid machines are often dedicated to the grid and are not preempted by outside work. This enables schedulers to compute the approximate completion time for a set of jobs, when their running characteristics are known.
As a further step, grid resources can be reserved in advance for a designated set of jobs. Such reservations operate much like a calendaring system used to reserve conference rooms for meetings. This is done to meet deadlines and guarantee quality of service. When policies permit, resources reserved in advance could also be scavenged to run lower priority jobs when they are not busy during a reservation period, yielding to jobs for which they are reserved. Thus, various combinations of scheduling, reservation, and scavenging can be used to more completely utilize the grid.
Scheduling and reservation is fairly straightforward when only one resource type, usually CPU, is involved. However, additional grid optimizations can be achieved by considering more resources in the scheduling and reservation process. For example, it would be desirable to assign executing jobs to machines nearest to the data that these jobs require. This would reduce network traffic and possibly reduce scalability limits. Optimal scheduling, considering multiple resources, is a difficult mathematics problem. Therefore, such schedulers may use heuristics. These heuristics are rules that are designed to improve the probability of finding
Chapter 3. Grid terms and concepts 25
the best combination of job schedules and reservations to optimize throughput or any other metric.
3.4 Grid software components
There are many aspects to grid computing that typically are controlled through software. These functions can be handled across a spectrum of very manual procedures to process being handled autonomically through sophisticated software. The software to perform these functions also ranges in capabilities and availability. Over time more sophisticated software will become available, but in many early grids with limited support resources, it makes sense that some of these processes are not implemented completely in software. However, we discus them in the following sections as software that should be considered when designing and deploying a grid environment.
3.4.1 Management components
Any grid system has some management components. First, there is a component that keeps track of the resources available to the grid and which users are members of the grid. This information is used primarily to decide where grid jobs should be assigned.
Second, there are measurement components that determine both the capacities of the nodes on the grid and their current utilization rate at any given time. This information is used to schedule jobs in the grid. Such information is also used to determine the health of the grid, alerting personnel to problems such as outages, congestion, or overcommitment. This information is also used to determine overall usage patterns and statistics, as well as to log and account for usage of grid resources.
Third, advanced grid management software can automatically manage many aspects of the grid. This is known as autonomic computing, or recovery oriented computing. This software would automatically recover from various kinds of grid failures and outages, finding alternative ways to get the workload processed.
3.4.2 Distributed grid management
Larger grids may have a hierarchical or other type of organizational topology usually matching the connectivity topology. That is, machines locally connected together with a LAN form a cluster of machines. The grid may be organized in a hierarchy consisting of clusters of clusters. The work involved in managing the grid is distributed to increase the scalability of the grid. The collection and grid operation and resource data as well as job scheduling is distributed to match the
26 Introduction to Grid Computing
topology of the grid. For example, a central job scheduler will not schedule a submitted job directly to the machine that is to execute it. Instead, the job is sent to a lower level scheduler that handles a set of machines or further clusters. The lower level scheduler handles the assignment to the specific machine. Similarly, the collection of statistical information is distributed. Lower level clusters receive activity information from the individual machines, aggregate it, and send it to higher level management nodes in the hierarchy.
3.4.3 Donor software
Each machine contributing resources typically needs to enroll as a member of the grid and install some software that manages the grids use of its resources. Usually, some sort of identification and authentication procedure must be performed before a machine can join the grid. Often certificates, such as those available through Certificate Authorities, can be used to establish and ensure the identity of the donor machine as well as the users and the grid itself.
Some grid systems provide their own login to the grid while others depend on the native operating systems for user authentication. In the latter case, a user ID mapping system may be needed to match the users rights properly on different machines. This typically is manually maintained by a grid administrator. He determines which user ID a given user may possess on each grid machine and enters these IDs in a protected database or registry. In this way, when grid jobs are submitted to different machines for a user, the proper local machine user ID is used for determining the users rights.
In some grid systems, it is possible to join the grid without any special authentication. And in others, it is possible for any user to submit jobs to the grid. Such systems may be convenient to set up, but should be discouraged in larger deployments due to the serious security problems that they would open up.
The grid system makes information about the newly added resources available throughout the grid. The donor machine will usually have some sort of monitor that determines or measures how busy the machine is and the rate or amount of resources utilized. This information is bubbled up to the management software of the grid and used to schedule use of those resources accordingly. In a scavenging system, this information tells the grid management software when the machine is idle and available for work.
Most importantly, the software installed on a given machine can accept an executable job from the grid management system and execute it. A user somewhere on the grid submits a job for execution on the grid. The grid management software must communicate with the grid donor software to send the job there. The donor grid software must be able to receive the executable file or select the proper one from copies preinstalled on the donor machine. The
Chapter 3. Grid terms and concepts 27
software is executed and the output is sent back to the requester. More advanced implementations can dynamically adjust the priority of a running job, suspend it, and resume running it later, or checkpoint it with the possibility of resuming its execution on a different machine. These kinds of actions may be necessary to respond to load balancing problems or priority or policy changes in the grid.
3.4.4 Submission software
Usually any member machine of a grid can be used to submit jobs to the grid and initiate grid queries. However, in some grid systems, this function is implemented as a separate component installed on submission nodes or submission clients. When a grid is built using dedicated resources rather than scavenged resources, separate submission software is usually installed on the users desktop or workstation.
3.4.5 Schedulers
Most grid systems include some sort of job scheduling software. This software locates a machine on which to run a grid job that has been submitted by a user. In the simplest cases, it may just blindly assign jobs in a roundrobin fashion to the next machine matching the resource requirements. However, there are advantages to using a more advanced scheduler.
Some schedulers implement a job priority system. This is sometimes done by using several job queues, each with a different priority. As grid machines become available to execute jobs, the jobs are taken from the highest priority queues first. Policies of various kinds are also implemented using schedulers. Policies can include various kinds of constraints on jobs, users, and resources. For example, there may be a policy that restricts grid jobs from executing at certain times of the day.
Schedulers usually react to the immediate grid load. They use measurement information about the current utilization of machines to determine which ones are not busy before submitting a job. Schedulers can be organized in a hierarchy. For example, a metascheduler may submit a job to a cluster scheduler or other lower level scheduler rather than to an individual machine.
More advanced schedulers will monitor the progress of scheduled jobs managing the overall work flow. If the jobs are lost due to system or network outages, a good scheduler will automatically resubmit the job elsewhere. However, if a job appears to be in an infinite loop and reaches a maximum time out, then such jobs should not be rescheduled. Typically, jobs have different kinds of completion codes, some of which are suitable for resubmission and some of which are not.
28 Introduction to Grid Computing
Reserving resources on the grid in advance is accomplished with a reservation system. It is more than a scheduler. It is first a calendarbased system for reserving resources for specific time periods and preventing any others from reserving the same resource at the same time. It also must be able to remove or suspend jobs that may be running on any machine or resource when the reservation period is reached.
3.4.6 Communications
A grid system may include software to help jobs communicate with each other. For example, an application may split itself into a large number of subjobs. Each of these subjobs is a separate job in the grid. However, the application may implement an algorithm that requires that the subjobs communicate some information among them. The subjobs need to be able to locate other specific subjobs, establish a communications connection with them, and send the appropriate data. The open standard Message Passing Interface MPI and any of several variations is often included as part of the grid system for just this kind of communication.
3.4.7 Observation and measurement
We mentioned above that schedulers react to current loads on the grid. Usually, the donor software includes some tools that measure the current load and activity on a given machine using either operating system facilities or by direct measurement. This software is sometimes referred to as a load sensor. Some grid systems provide the means for implementing custom load sensors for other than CPU or storage resources.
Such measurement information is useful not only for scheduling, but also for discovering overall usage patterns in the grid. The statistics can show trends that may signal the need for additional hardware. Also, measurement information about specific jobs can be collected and used to better predict the resource requirements of that job the next time it is run. The better the prediction, the more efficiently the grids workload can be managed.
The measurement information can also be saved for accounting purposes, or to form the basis for grid resource brokering, or to manage priorities more fairly. The information can also be displayed in various forms to better visualize grid activity and utilization.
Chapter 3. Grid terms and concepts 29
3.5 Intragrid and intergrid
As already mentioned, the definition of a grid is somewhat subjective. Therefore, the following descriptions of various kinds of grids must be taken loosely.
Grids can be built in all sizes, ranging from just a few machines in a department to groups of machines organized as a hierarchy spanning the world. In this section, we describe some examples in this range of grid system topologies.
Figure 33 A simple grid
As presented in Figure 33, the simplest grid consists of just a few machines, all of the same hardware architecture and same operating system, connected on a local network. This kind of grid uses homogeneous systems so there are fewer considerations and may be used for specialized applications. The machines are usually in one department of an organization, and their use as a grid may not require any special policies or security concerns. Because the machines have the same architecture and operating system, choosing application software for these machines is usually simple. Some people would call this a cluster implementation rather than a grid.
30 Introduction to Grid Computing
The next progression would be to include heterogeneous machines. In this configuration, more types of resources are available. The grid system is likely to include some scheduling components. File sharing may still be accomplished using networked file systems. Machines participating in the grid may include systems from multiple departments but within the same organization. Such a grid is also referred to as an intragrid.
As the grid expands to many departments, policies may be required for how the grid should be used. For example, there may be policies for what kinds of work is allowed on the grid and at what times. There may be a prioritization by department or by kinds of applications that should have access to grid resources. Also, security becomes more important as more organizations are involved. Sensitive data in one department may need to be protected from access by jobs running for other departments. Dedicated grid machines may be added to increase the quality of service for grid computing, rather than depending entirely on scavenged resources.
The grid may grow geographically in an organization that has facilities in different cities. Dedicated communications connections may be used among these facilities and the grid. In some cases, VPN tunneling or other technologies may be used over the Internet to connect the different parts of the organization. Security increases in importance once the bounds of any given facility are traversed. The grid may grow to be hierarchically organized to reduce the contention implied by central control, increasing scalability.
Over time, as illustrated in Figure 34 on page 32, a grid may grow to cross organization boundaries, and may be used to collaborate on projects of common interest. This is known as an intergrid. The highest levels of security are usually required in this configuration. The intragrid offers the prospect for trading or brokering resources over a much wider audience. Resources may be purchased as a utility from trusted suppliers.
Chapter 3. Grid terms and concepts 31
Figure 34 A more complex intergrid
3.6 Summary
This chapter provided an overview of some of the key terms and concepts related to grid computing. This information may help as you read this book or other literature on grid computing.
32 Introduction to Grid Computing
Chapter 4.
Grid user roles
This chapter briefly describes grid computing from the perspectives of the user and the administrator. The architect and application developer are other key roles in a grid environment. Information applicable to those roles are touched on in subsequent chapters.
Copyright IBM Corp. 2005. All rights reserved. 33
4
4.1 Using a grid: A users perspective
This section describes the typical activities in utilizing a grid from a users perspective.
4.1.1 Enrolling and installing grid software
A user may first have to enroll in the grid and install the provided grid software on his own machine. He may optionally enroll his machine as a donor on the grid.
Enrolling in the grid may require authentication for security purposes. The user positively establishes his identity with a Certificate Authority. This should not be done solely via the Internet. The Certificate Authority must take steps to assure that the user is in fact who he claims to be. The Certificate Authority makes a special certificate available to software needing to check the true identity of a grid user and his grid requests. Similar steps may be required to identify the donating machine. The user has the responsibility of keeping his grid credentials secure.
Once the user andor machine are authenticated, the grid software is provided to the user for installing on his machine for the purposes of using the grid as well as donating to the grid. This software may be automatically preconfigured by the grid management system to know the communication address of the management nodes in the grid and user or machine identification information. In this way, the installation may be a oneclick operation with a minimum of interaction required on the part of the user. In less automated grid installations, the user may be asked to identify the grids management node and possibly other configuration information. He may choose to limit the resources donated to the grid, the times that his machine is usable by the grid, and other policyrelated constraints. The user may also need to inform the grid administrator which user IDs are his on other machines that exist on the grid.
4.1.2 Logging onto the grid
Most grid systems require the user to log on to a system using an ID that is enrolled in the grid. Other grid systems may have their own grid login ID separate from the one on the operating system. A grid login is usually more convenient for grid users. It eliminates the ID matching problems among different machines. To the user, it makes the grid look more like one large virtual computer rather than a collection of individual machines. Some grid environments may use a proxy login model that keeps the user logged in for a specified amount of time, even if he logs off and back on the operating system and even if the machine is rebooted.
34 Introduction to Grid Computing
Once logged on, the user can query the grid and submit jobs. Some grid implementations permit some query functions if the user is not logged into the grid or even if the user is not enrolled in the grid.
4.1.3 Queries and submitting jobs
The user will usually perform some queries to check to see how busy the grid is, to see how his submitted jobs are progressing, and to look for resources on the grid. Grid systems usually provide commandline tools as well as graphical user interfaces GUIs for queries. Commandline tools are especially useful when the user wants to write a script that automates a sequence of actions. For example, the user might write a script to look for an available resource, submit a job to it, watch the progress of the job, and present the results when the job has finished.
Job submission usually consists of three parts, even if there is only one command required. First, some input data and possibly the executable program or execution script file are sent to the machine to execute the job. Sending the input is called staging the input data. Alternatively, the data and program files may be preinstalled on the grid machines or accessible via a mountable networked file system. When the grid consists of heterogeneous machines, there may be multiple executable program files, each compiled for the different machine platforms on the grid. A nice feature provided by some grid systems is to register these multiple versions of the program so that the grid system can automatically choose a correctly matching version to the grid machine that will run the program. Some grid technologies require that the program and input data be first processed or wrappered in some way by the grid system. This may be done to add protective execution controls around the application or just to simply collect all of the data files into one.
Second, the job is executed on the grid machine. The grid software running on the donating machine executes the program in a process on the users behalf. It may use a common user ID on the machine or it may use the users own user ID, depending on which grid technology is used. Some grid systems implement a protective sandbox around the program so that it cannot cause any disruption to the donating machine if it encounters a problem during execution. Rights to access files and other resources on the grid machine may be restricted.
Third, the results of the job are sent back to the submitter. In some implementations, intermediate results can be viewed by the user who submitted the job. In some grid technologies that do not automatically stage the output data back to the user, the results must be explicitly sent to the user, perhaps using a networked file system.
Scripts are also useful for submitting a series of jobs, for a parameter space application, for example. Some computation problems consist of a search for the
Chapter 4. Grid user roles 35
desired result based on some input parameters. The goal is to find the input parameters that produce the best desired result. For each input parameter, a separate job is executed to find the result for that value. The whole application consists of many such jobs, which explore the results for a large number of input parameter values. Scripts are usually used to launch the many subjobs, each receiving their own particular parameter values. Parameter inputs can sometimes be more complex than simply a number. Sometimes a different input data set represents the input parameter. Scripts help automate the large variety of more complex parameter space study problems. For simpler parameter space inputs, some grid products provide a GUI to submit the series of subjobs, each with different input parameter values.
When there are a large number of subjobs, the work required to collect the results and produce the final result is usually accomplished by a single program, usually running on the machine at the point of job submission. If there are a very large number subjobs required for an application, the work of collecting the results might be distributed as well. For example, the subjob that submits more subjobs to the grid would be responsible for collecting and aggregating the results of the subjobs it spawned.
4.1.4 Data configuration
The data accessed by the grid jobs may simply be staged in and out by the grid system. However, depending on its size and the number of jobs, this can potentially add up to a large amount of data traffic. For this reason, some thought is usually given on how to arrange to have the minimum of such data movement on the grid.
For example, if there will be a very large number of subjobs running on most of the grid systems for an application that will be repeatedly run, the data they use may be copied to each machine and reside until the next time the application runs. This is preferable to using a networked file system to share this data, because in such a file system, the data would be effectively moved from a central location every time the application is run. This is true unless the file system implements a caching feature or replicates the data automatically.
There are many considerations in efficiently planning the distribution and sharing of data on a grid. This type of analysis is necessary for large jobs to better utilize the grid and not create unnecessary bottlenecks.
4.1.5 Monitoring progress and recovery
The user can query the grid system to see how his application and its subjobs are progressing. When the number of subjobs becomes large, it becomes too difficult to list them all in a graphical window. Instead, there may simply be one
36 Introduction to Grid Computing
large bar graph showing some averaged progress metric. It becomes more difficult for the user to tell if any particular subjob is not running properly.
A grid system, in conjunction with its job scheduler, often provides some degree of recovery for subjobs that fail. A job may fail due to a:
Programming error: The job stops part way with some program fault.
Hardware or power failure: The machine or devices being used stops working
in some way.
Communications interruption: A communication path to the machine has failed or is overloaded with other data traffic.
Excessive slowness: The job might be in an infinite loop or normal job progress may be limited by another process running at a higher priority or some other form of contention.
It is not always possible to automatically determine if the reason for a jobs failure is due to a problem with the design of the application or if it is due to failures of various kinds in the grid system infrastructure. Schedulers are often designed to categorize job failures in some way and automatically resubmit jobs so that they are likely to succeed, running elsewhere on the grid. In some systems, the user is informed about any job failures and the user must decide whether to issue a command to attempt to rerun the failed jobs.
Grid applications can be designed to automate the monitoring and recovery of their own subjobs using functions provided by the grid system software application programming interfaces APIs.
4.1.6 Reserving resources
To improve the quality of a service, the user may arrange to reserve a set of resources in advance for his exclusive or highpriority use. A calendaring system analogy can be used here. Such a reservation system can also be used in conjunction with planned hardware or software maintenance events, when the affected resource might not be available for grid use.
In a scavenging grid, it may not be possible to reserve specific machines in advance. Instead, the grid management systems may allocate a larger fraction of its capacity for a given reservation to allow for the likelihood of some of the resources becoming unavailable. This must be done in conjunction with tools that have profiled the grids workload capacity sufficiently to have reliable statistics about the grids ability to serve the reservation.
Chapter 4. Grid user roles 37
4.2 Using a grid: An administrators perspective
This section describes the typical usage activities in using the grid from an administrators perspective.
4.2.1 Planning
The administrator should understand the organizations requirements for the grid to better choose the grid technologies that satisfy those requirements. The following sections briefly describe the steps the administrator may take to manage the grid. It is suggested that one should start by deploying a small grid first, to learn about its installation and management, before having to confront more complicated issues involved with a large grid.
The use of a grid is often born from a need for increased resources of some type. One often looks to their neighbor who may have excess capacity in the particular resource. One of the first considerations is the hardware available and how it is connected via a LAN or WAN. Next, an organization may want to add additional hardware to augment the capabilities of the grid. It is important to understand the applications to be used on the grid. Their characteristics can affect the decisions of how to best choose and configure the hardware and its connectivity.
Security
Security is a much more important factor in planning and maintaining a grid than in conventional distributed computing, where data sharing comprises the bulk of the activity. In a grid, the member machines are configured to execute programs rather than just move data. This makes an unsecured grid potentially fertile ground for viruses and trojan horse programs. For this reason, it is important to understand exactly which components of the grid must be rigorously secured to deter any kind of attack. Furthermore, it is important to understand the issues involved in authenticating users and providing proper authorization for specific operations.
Organization
The technology considerations are important in deploying a grid. However, organizational and business issues can be equally important. It is important to understand how the departments in an organization interact, operate, and contribute to the whole. Often, there are barriers built between departments and projects to protect their resources in an effort to increase the probability of timely success. However, by rethinking some of these relationships, one can find that more sharing of resources can sometimes benefit the entire organization. For example, a project that finds itself behind schedule and over budget may not be able to afford the resources required to solve the problem. A grid would give such projects an added measure of safety, providing an extra margin of resource
38 Introduction to Grid Computing
capacity needed to finish the project. Similarly, a project in its early stages, when computing resources are not being fully utilized, may be able to donate them to other projects in need. A grid also offers the ability for the organizations management to see the bigger picture and react more quickly in shifting resource utilization, priorities, and policies.
4.2.2 Installation
First, the selected grid system must be installed on an appropriately configured set of machines. These machines should be connected using networks with sufficient bandwidth to other machines on the grid. Of prime importance is understanding the failover scenarios for the given grid system so that the grid can continue operating even if any of the management machines fail in some way. Machines should be configured and connected to facilitate recovery scenarios. Any critical databases or other data essential for keeping track of the jobs in the grid, members of the grid, and machines on the grid should have suitable backups. Furthermore, public key certificates must be backed up and the private keys must be held in a highly secured place inaccessible by anyone else.
After installation, the grid software may need to be configured for the local network address and IDs. The administrator will usually require root access to the machines managing the grid. In some grid systems, he will also need root access to the donor machines required to install the software on those as well. The software to be installed on the donor machines may need to be customized so that it can find the grid management machines automatically and include preinstalled public keys for the grid. This software may be provided to potential donors on an FTP or equivalent server or be made available on physical media.
Once the grid is operational, there may be application software and data that should be installed on donor machines as well. This software may have specific licensing restrictions that should be understood and adhered to. Some grid systems include tools to assist with gridwide license management. This can both help in following the rules of the licenses and most efficiently exploit those licenses.
4.2.3 Managing enrollment of donors and users
An ongoing task for the grid administrator is to manage the members of the grid, both the machines donating resources and the users. Users may be further organized as project groups. The administrator is responsible for controlling the rights of the users in the grid. Donor machines may have access rights that require management as well. Grid jobs running on donor machines may be executed under a special grid user ID on behalf of the users submitting the jobs.
Chapter 4. Grid user roles 39
The rights of these grid user IDs must be properly set so that grid jobs do not allow access to parts of the donor machine to which the users are not entitled.
As users join the grid, their identity must be positively established and entered in the Certificate Authority. The user and his certificate credentials must be added to the user list using the software appropriate for the grid system deployed. In some cases, the administrator must propagate the user information to several or all grid machines. Also, when the grid system depends primarily on the operating system for user login, the administrator may need to add entries to map the grid user to specific operating system user IDs on the donor machines.
Similar enrollment activity is usually required to enroll donor machines into the grid. The machines identity is established and registered with the Certificate Authority. The administrator of the grid must have an agreement with the administrator of the donor machine about user IDs, software, access rights, and any policy restrictions. The administrator must enter the machines identification credentials, addresses, and resource characteristics using the appropriate software for enrolling the donor machine into the grid. In some cases, the administrator may need to manually propagate this information to other machines in the grid.
Corresponding procedures for removing users and machines must be executed by the administrator.
4.2.4 Certificate authority
It is critical to ensure the highest levels of security in a grid because the grid is designed to execute code and not just share data. Thus, it can be fertile ground for viruses, trojan horses, and other attacks if the grid system is compromised in any way. The Certificate Authority is one of the most important aspects of maintaining strong grid security. An organization may choose to use an external Certificate Authority or operate one itself. You must be able to trust the Certificate Authority to strictly adhere to its responsibilities.
The primary responsibilities of a Certificate Authority are:
Positively identifying entities requesting certificates
Issuing, removing, and archiving certificates
Protecting the Certificate Authority server
Maintaining a namespace of unique names for certificate owners
Serving signed certificates to those needing to authenticate entities
Logging activity
Briefly, a Certificate Authority is based on the public key encryption system. In this system, keys are generated in pairs, a public key and a private key. Either one can be used to encrypt some data such that the other is needed to decrypt it.
40 Introduction to Grid Computing
The private key is guarded by the owner and never revealed to anyone. The public one is given to anyone needing it. A Certificate Authority is used to hold these public keys and to guarantee who they belong to. When a user uses his private key to encrypt something, the receiver uses the corresponding public key to decrypt it. The receiver knows that only that users public key can decrypt the message correctly. However, anyone could intercept this message and decrypt it because anyone can get the originators public key. If the originator instead doubly encrypts the message with his private key and the intended recipients public key, a secure communication link is formed. The receiver uses his private key to decrypt the message and then uses the senders public key for the second decryption. Now the recipient knows that if the message decrypts properly, then only the sender could have sent it and, furthermore, the sender knows that only the intended receiver can decrypt it. The beauty of all of this is that nobody had to securely carry an encryption key from the sender to the receiver, as must be done for conventional encryption systems, and any tampering with the communication is revealed. A similar exchange is used to get anyones public key from the Certificate Authority, so that the user knows that he has received an unaltered public key for the desired user.
4.2.5 Resource management
Another responsibility of the administrator is to manage the resources of the grid. This includes setting permissions for grid users to use the resources as well as tracking resource usage and implementing a corresponding accounting or billing system. Usage statistics are useful in identifying trends in an organization that may require the acquisition of additional hardware, reduction in excess hardware to reduce costs, and adjustments in priorities and policies to achieve utilization that is fairer or better achieves the overall goals of an organization.
Some grid components, usually job schedulers, have provisions for enforcing priorities and policies of various kinds. It is the responsibility of the administrator to configure these to best meet the goals of the overall organization. Software license managers can be used in a grid setting to control the proper utilization. These may be configured to work with job schedulers to prioritize the use of the limited licenses.
4.2.6 Data sharing
For small grids, the sharing of data can be fairly easy, using existing networked file systems, databases, or standard data transfer protocols. As a grid grows and the users become dependent on any of the data storage repositories, the administrator should consider procedures to maintain backup copies and replicas to improve performance. All of the resource management concerns apply to data on the grid.
Chapter 4. Grid user roles 41
4.3 Summary
When considering whether a grid environment is applicable to a particular organization or set of requirements, two key user perspectives must be considered. First, the end users perspective and how they will access the grid and gain benefits from using it. Second, how will a grid be administered, especially when resources making up the grid may be distributed both geographically as well as organizationally.
This chapter discussed some of the key points to consider for both of these user roles. Once it is decided that a grid may be the right solution, the architect and application developer will need to be involved to ensure the grid and its related applications are designed and implemented to meet the business requirements.
42 Introduction to Grid Computing
Grid architecture considerations
Part 2
Part 2
Copyright IBM Corp. 2005. All rights reserved. 43
44 Introduction to Grid Computing
Chapter 5.
Standards for grid environments
As we described in Chapter 1, What grid Computing is on page 3, grid computing consists of many concepts, and can be defined in many ways. But, at its essence, it provides for distributed computing utilizing virtual resources.
Many technologies could be used to implement such an environment. However, to ensure that various resources across a wide variety of hardware and software platforms can peacefully coexist and interoperate, standards need to be defined and widely adopted.
This chapter describes just a few of the key standards and evolving standards that apply to grid computing.
Copyright IBM Corp. 2005. All rights reserved. 45
5
5.1 Overview
As we have discussed, grid computing assumes andor requires technologies that include:
Support for executing programs on a variety of platforms
A secure infrastructure
Data movementreplicationfederation
Resource discovery
Resource management
For each of these areas, there are a variety of technologies available that could be used to address them. We will look at just a few of the standards both proposed and adopted that could be considered when architecting a gridbased solution.
Standards bodies that are involved in areas related to grid computing include:
Global Grid Forum GGF
http:www.ggf.org
Organization for the Advancement of Structured Information Standards OASIS
http:www.oasisopen.org
World Wide Web Consortium W3C http:www.w3.org
Distributed Management Task Force DMTF http:www.dmtf.org
Web Services Interoperability Organization WSI http:www.wsi.org
5.1.1 OGSA
The Global Grid Forum has published the Open Grid Service Architecture OGSA. To address the requirements of grid computing in an open and standard way, requires a framework for distributed systems that support integration, virtualization, and management. Such a framework requires a core set of interfaces, expected behaviors, resource models, and bindings.
OGSA defines requirements for these core capabilities and thus provides a general reference architecture for grid computing environments. It identifies the components and functions that are useful if not required for a grid environment. Though it does not go to the level of detail such as defining programmatic
46 Introduction to Grid Computing
5.1.2 OGSI
interfaces or other aspects that would guarantee interoperabilty between implementations, it can be used to identify the functions that should be included based on the requirements of the specific target environment.
For more information, refer to:
http:www.ggf.org
http:www.globus.orgogsa
As grid computing has evolved it has become clear that a serviceoriented architecture could provide many benefits in the implementation of a grid infrastructure. The Global Grid Forum extended the concepts defined in OGSA to define specific interfaces to various services that would implement the functions defined by OGSA.
More specifically, the Open Grid Services Interface OGSI defines mechanisms for creating, managing, and exchanging information among Grid services. A Grid service is a Web service that conforms to a set of interfaces and behaviors that define how a client interacts with a Grid service.
These interfaces and behaviors, along with other OGSI mechanisms associated with Grid service creation and discovery, provide the basis for a robust grid environment. OGSI provides the Web Service Definition Language WSDL definitions for these key interfaces.
Globus Toolkit 3 included several of its core functions as Grid services conforming to OGSI.
For more information, refer to:
http:www.globus.orgtoolkitdraftggfogsigridservice3320030627.pdf
5.1.3 OGSADAI
The OGSADAI data access and integration project is concerned with constructing middleware to assist with access and integration of data from separate data sources via the grid. The project was conceived by the UK Database Task Force and is working closely with the Global Grid Forum DAISWG and the Globus team.
For more information, refer to:
http:www.ogsadai.org.uk
Chapter 5. Standards for grid environments 47
5.1.4 GridFTP
GridFTP is a secure and reliable data transfer protocol providing high performance and optimized for widearea networks that have high bandwidth. As one might guess from its name, it is based upon the Internet FTP protocol and includes extensions that make it a desirable tool in a grid environment. The GridFTP protocol specification is a proposed recommendation document in the Global Grid Forum GFDRP.020.
GridFTP uses basic Grid security on both control command and data channels. Features include multiple data channels for parallel transfers, partial file transfers, thirdparty transfers, and more.
GridFTP can be used to move files especially large files across a network efficiently and reliably. These files may include the executables required for an application or data to be consumed or returned by an application. Higher level services, such as data replication services, could be built on top of GridFTP.
For more information, refer to:
http:www.globus.orggridsoftwaredatagridftp.php
5.1.5 WSRF
Web Services Resource Framework WSRF is described in more detail in Chapter 9, Web services resource framework on page 115. WSRF is being promoted and developed through work from a variety of companies, including IBM, and has been submitted to OASIS technical committees. Basically, WSRF defines a set of specifications for defining the relationship between Web services that are normally stateless and stateful resources. WSRF is a general term that encompasses several related proposed standards that cover:
Resources
Resource lifetime
Resource properties
Service groups collections of resources
Faults
Notifications
Topics
As the concept of Grid services evolves, the WSRF suite of evolving standards holds great promise for the merging of Web services standards with the stateful resource management requirements of grid computing.
For more information, refer to:
http:www.oasisopen.orgcommitteestchome.php?wgabbrevwsrf
http:www.globus.orgwsrf
48 Introduction to Grid Computing
5.1.6 Web services related standards
Because Grid services are so closely related to Web services, the plethora of standards associated with Web services also apply to Grid services. We do not describe all of these standards in this document, but rather recommend that the reader become familiar with standards commonly associate with Web services, such as:
XML
WSDL SOAP UDDI
In addition, there are many evolving standards related to Web Services Interoperabilty WSI that also can be applied to and bring value to grid environments, standards, and proposed standards.
For more information, refer to:
http:www.w3.org2002ws
http:www.wsi.org
Chapter 5. Standards for grid environments 49
50 Introduction to Grid Computing
Chapter 6.
Application considerations
As we have already discussed, grid computing environments provide a distributed computing environment utilizing virtualized resources. With this in mind, applications that can take advantage of distributed computing capabilities are possible candidates for grid computing environments. Of course, they may need to be adapted to be able to take advantage of virtual resources through the use of open interfaces and standards such as those discussed in Chapter 5, Standards for grid environments on page 45.
This chapter provides an overview of various considerations when contemplating the development or modification of an application to take advantage of grid computing.
Copyright IBM Corp. 2005. All rights reserved. 51
6
6.1 General application considerations
Attention: Aside from applicationspecific criteria described below, it should be obvious that applications that can most easily take advantage of grid computing should be designed to be portable and to utilize virtual resources. Utilizing open standards such as those described in the previous chapter provides a solid foundation for ensuring such portability.
While a gridbased environment may offer many advantages, any given application may not necessarily benefit from a grid. For example, some personal productivity applications are tightly coupled with a users interface and do not consume a large amount of computing resources. Running them on a grid may not provide significant benefits. However, other applications may be very suited for exploiting a grid.
If we take a parochial view of the grid as an environment that provides access to vast amounts of computing power, one of the simplest concepts for grid utilization is to be able to run an application somewhere else when your own machine is too busy or otherwise does not have the required resources. Almost any kind of application can be executed in a grid environment this way. You may not see spectacular performance gains unless the machine it runs on is much faster than the machine you usually use.
Applications that can be run in a batch mode are the easiest to execute on other resources within the grid. Applications that need interaction through graphical user interfaces are more difficult to run on a grid, but not impossible. For instance, they can use remote graphical terminal support, such as X Windows or other similar capabilities.
In subsequent sections of this chapter, we discuss many considerations for applications that are CPU intensive or have various requirements associated with data access or sharing. These numbercrunching types of applications have historically gained efficiencies by running in a cluster environment or more recently in a grid that some consider a distributed cluster. However, with advances in grid middleware and the economic incentives to run more typical business applications on virtualized resources, there is a trend towards understanding how these business applications can be implemented or modified to take advantage of the various resources provided by a grid computing environment.
For this discussion, let us consider a grid environment to be distributed computing on virtualized resources. Distributed computing concepts are well known and most application architects and developers understand what it takes to enable an application to execute and take advantage of a distributed
52 Introduction to Grid Computing
computing environment. What is new, is the utilization of virtualized resources. This implies that the application developer may not know and optimally should not need to know what operating platform or other resources network, storage, and so on will be utilized by the application at run time. The more an application can be written to be independent of actual physical resources, the more likely it can take advantage of a grid environment by running on any available resources that provide the required services and conform to any applicable policies.
A key aspect of writing applications that are independent of physical resources is the conformance to widely adopted standards. For instance, applications that are implemented as services and can be deployed to any compliant J2EE container have little or no dependency on the underlying hardware or operating system. Therefore, they could potentially be deployed to and run on any systems within the grid that have a compliant J2EE container.
Likewise, using standardized interfaces for access to storage, databases, and network communications provides the portability required to run on virtualized resources independent of their physical makeup.
Applications specifically designed to use multiple processors or other federated resources of a grid will benefit most. The following discussion is designed to stimulate analysis, which will show how various factors may help decide whether a given application should be deployed on a grid and what modifications, if any, might be considered.
6.2 CPUintensive application considerations
To determine if existing or planned applications that are CPU intensive can take advantage of a grid environment requires many considerations. This section describes some things to consider related to the possible applicability of a grid to these applications.
Probably the most important step in grid enabling an application is to determine whether the calculations can be done in parallel. While High Performance Computing HPC clusters are sometimes used to handle the execution of applications that can utilize parallel processing, grids provide the ability to run these applications across a heterogeneous, geographically dispersed set of clusters. Rather than run the application on a single homogenous cluster, the application can take advantage of the larger set of resources in the grid. If the algorithm is such that each computation depends on the prior calculation, then a new algorithm if possible may be beneficial. Not all problems can be converted into parallel calculations. As an oversimplified example, let us take the process of adding up a large list of numbers. The simple serial program may be written to start with the sum of zero and then add each of the numbers, one at a time, until
Chapter 6. Application considerations 53
Figure 61
Rearranging computations to execute in parallel
On the other hand, some computations cannot be rewritten to execute in parallel. For example, in physics, there are no simple formulas that show where three or more moving bodies in space will be after a specified time when they gravitationally affect each other. These kinds of computations are done by simulating the motions of the bodies, applying Newtons or Einsteins laws to small time increments, and computing how the forces and bodies affect each other, given the new position of the objects after each tiny time increment, as illustrated in Figure 62 on page 55.
the final sum is reached. Here each calculation depends on the prior one. However, we can observe that the associative property of arithmetic shows us that we could break the list up into seven pieces, for example, with seven separate programs adding up the numbers in each list, and then a final eighth program adding the 7 sums to form the final answer. This is illustrated by Figure 61.
54 Introduction to Grid Computing
Figure 62
Simulation that cannot be made parallel but needs to run many times
This is repeated a great number of times until the desired time is reached. Each computation depends on the prior one. If it did not, then we would have used a different formula or algorithm to begin with. Because the time increments are not infinitely small, after many increments, small errors start adding up. The final computed position of the objects can be in error, perhaps ultimately causing a spacecraft to crash into a planet instead of going into orbit. To improve accuracy in such computations, we make the time increments much shorter. This increases the number of these increments to be computed, and thus the overall computation time. Many simulations suffer from this type of difficulty.
As we saw above, in the listadding example, such computations can be performed in parallel, while others, such as the 3body physics problem, cannot. Often, an application may be a mix of independent computations as well as dependent computations. One needs to analyze the application to see if there is
Chapter 6. Application considerations 55
a way to split some subset of the work. Drawing a program flow graph and a data dependency graph can help in analyzing whether and how an application could be separated into independently running parallel parts.
Going back to the space object example, let us say we are trying to find the correct trajectory to aim a rocket so that it loops around Venus, and then Earth, to reach Jupiter more quickly. We might try calculating to see what happens for a large number of different trajectories, pointing the rocket in slightly different directions and firing the engines for different durations. Each trajectory can be thought of as a separate calculation, and then in the end, a program chooses the best one. Here, we are able to perform work in parallel, even though the underlying computation for a single trajectory may be serial. Applications that consist of a large number of independent subjobs are very suitable for exploiting grid CPU resources. These are sometimes called parameter space searches.
Figure 63 Redundant speculative computation to reduce latency
Another approach to reducing data dependency on prior computations is to look for ways to use redundant computations. If the dependency is on a subset of the prior computations, it may be beneficial just to have each successive
56 Introduction to Grid Computing
computation that needs the results of the prior computation recompute those results instead of waiting for them to arrive from another job. If the dependency is on a computation that has a yesno answer, perhaps it is better to compute the next calculations for both of the yes and no cases and throw away the wrong choice when the dependency is finally known, as illustrated in Figure 63 on page 56. This technique can be taken to extremes in various ways. For example, for two bits of data dependency, we could make four copies of the next computation with all four possible input values. This can proceed to copies of the next calculation for N bits of data dependency. As N gets large, it quickly becomes too costly to compute all possible computations. However, we may speculate and only perform the copies for the values we guess might be more likely to be correct. If we did not guess the correct one, then we simply end up computing it in series, but if we guessed correctly it saves us overall real time. Here heuristics rules of thumb could be developed to make the best possible guesses. Furthermore, there may be many points in the application where we could use the speculative approach, and if our guess rate is high enough, there might be an overall improvement in efficiency and parallelism. This same kind of speculative computing is used to improve the efficiency inside CPUs by executing both branches of a condition until the correct one is determined.
Some parameter space problems are finite in nature, and some are infinite or so large that all possible parameter inputs cannot be examined. For these kinds of parameter space problems, it is useful to use additional heuristics to select which parts of the parameter space to try. This may not lead to the absolute best solution, but it may be close enough. The traveling salesman problem can be intractable in this way when there are many cities to be visited. However, various heuristics can be used to get reasonably close to an optimal solution. It may not be worth a month of additional computation to improve the answer from 98 percent to 99 percent efficiency.
It may be acceptable to explore only a small part of the parameter space. One approach is to try a reasonable number of randomly scattered points in the problems parameter space first. Then one would try small changes in the parameters around the best points that might lead to a better solution. This technique is useful when the parameter space relates relatively smoothly to changes in the result.
By analogy, this can be described as hill climbing. To find the highest altitude point in a perpetually fogshrouded region of land on which to build a television broadcast antenna, you would put a set of people at random on the terrain. Then each would climb to the highest point near them. Whomever reached the highest point would then be declared to have found the highest hill in the land. They may not have found the absolute highest point if nobody started near that point, but they will probably find the nearly highest hill or one that is sufficient for their
Chapter 6. Application considerations 57
antenna tower. This kind of technique is useful when there are too few people and too many hills to visit all of them.
Often, mathematical calculations are commutative, associative, or linear in some way. The simple adding of a list of numbers example illustrates this. By altering some potentially unimportant rules in the computations involved in a calculation, we may be able to break the ordering requirement and thus make it possible to execute more of the application in parallel. For example, in a bank account, deposits and withdrawals are serially calculated and if the account ever goes negative, then the transaction may be rejected, a fine may be imposed, or the account may be frozen. If, however, the bank changes its rules and says that the account must simply be positive at the end of the day, then withdrawals processed before the deposits would not cause a problem and all of these calculations could be broken up into separate, parallelrunning jobs.
Many times, an application that was written for a single processor may not be organized or use algorithms or approaches that are suitable for splitting into parallel subcomputations. An application may have been written in a way that makes it most efficient on a single processor machine. However, there may be other methods or algorithms that are not as efficient, yet may be much more amenable to being split into independently running subcomputations. A different algorithm may scale better because it can more efficiently use larger and larger numbers of processors. Thus, another approach for grid enabling an application is to revisit the choices made when the application was originally written. Some of the discarded approaches may be better for grid use.
How you go about solving a problem may be quite different, depending on whether it is unique to be solved only once versus being solved repeatedly with different inputs. One might use a less efficient but more straightforward technique if the problem is only to be solved once, reducing debug time and making good use of a grids ability to absorb momentary peaks of activity. On the other hand, if it is a one time problem, but is going to take a year of execution, more thought should be put into the problem before proceeding. The following are some additional things to think about.
Is there any part of the computation that would be performed more than once using the same data? If so, and if that computation is a significant portion of the overall work, it may be useful to save the results of such computations.
If we find that an application performs some sets of computations on the same input data every time it is run, produces the same output data, and takes a significant amount of time computing this output, how much output data would need to be saved to avoid the computation the next time? If there is a very large amount of output data, it may be prohibitive to save this data. Perhaps there are a large number of similar computations that might be saved. Even if any one computations results do not represent a large amount of data, the aggregate for
58 Introduction to Grid Computing
all of them might. One needs to consider this timespace tradeoff for the application. One could presumably save space and time by only saving the results for the most frequently occurring situations. For example, in world class chess playing programs, the opening positions of the game of chess are usually stored in a database containing the best move to take in each such position. This information can be precomputed to a large extent and can save large amounts of computation time during a chess tournament. However, the number of possible chess board positions increases very rapidly with more moves into the game, so only the early move positions of the game or the endgame moves when there are few pieces left, are precomputed and saved.
In a distributed application, partial results or data dependencies may be met by communicating among subjobs. That is, one job may compute some intermediate result and then transmit it to another job in the grid. If possible, one should consider whether it might be more efficient to simply recompute the intermediate result at the point where it is needed rather than waiting for it from another job. One should also consider the transfer time from another job, versus retrieving it from a database of prior computations.
6.3 Data considerations
When considering applications that may be split into multiple parts for execution on a grid, it is important to consider the amounts of data that are needed to be sent to the node performing a calculation and the time required to send it. If the application can be split into small work units requiring little input data and producing small amounts of output data, that would be most ideal. The data in this kind of case is said to be staged to the node doing the work. Sending this data along with the executable file to the grid node doing the work is part of the function of most grid systems. However, in many applications, larger amounts of input andor output data are involved, and this can cause complications and inefficiencies.
When the grid application is split into subjobs, often the input data is a large fixed set of data. This offers the opportunity to share this data rather than staging the entire set with each subjob. However, one must consider that even with a shared mountable file system, the data is being sent over the network. The goal is to locate the shared data closer to the jobs that need the data. If the data is going to be used more than once, it could be replicated to the degree that space permits.
If more than one copy of the data is stored in the grid, it is important to arrange for the subjobs to access the nearest copy per the configuration of the network. This highlights the need for an information service within the grid to track this form of data awareness. Furthermore, one must be careful that the network does not become the bottleneck for such a grid application. If each subjob processes
Chapter 6. Application considerations 59
the data very quickly and is always waiting for more data to arrive, then sharing may not be the best model if the network data transfer speed to each subjob does not at least match disk speeds.
Shared data may be fixed or changing. For example, a database may contain the latest known gene sequences and be constantly growing. However, applications using this data may not need the latest gene sequence data the instant that it is available. This makes it easier and more efficient to share such a database because the updates to it can be batched and processed at offpeak usage times rather than contending with concurrent access by applications. Furthermore, if more than one copy of this data exists, and all of the copies do not need to be simultaneously updated, this improves performance because all applications using the data would not need to be stopped while updating the data. Only those accessing a particular copy would need to be stopped or temporarily paused.
When a file or a database is updated, jobs cannot simultaneously read the portion of the file concurrently being updated by another job. Locking or synchronizing primitives are typically built into the file system or database to automatically prevent this. Otherwise, the application might read partially updated data, perhaps receiving a combination of old and new data.
In some shared data situations, updates must not be delayed. For example, if the subjobs are processing financial transactions, they must be immediately updated in the master balances database. Furthermore, if there are copies of this database elsewhere, they must all be updated with each new item simultaneously. A number of scaling issues come into play here. There can be a large amount of data synchronization communications among jobs and databases. The synchronization primitives can become bottlenecks in overall grid performance. It is important to consider how the database activity can be partitioned so that there is less interference among the parts and thus less potential synchronization contention among those parts.
Applications that access the data they need serially are more predictable, so various techniques can be used to improve their performance on the grid. If each subjob needs to access all of the data, then shared copies might be desirable. Multiple copies of the data should be considered if bringing the data closer to the nodes running the subjobs would help. If each part of the data is examined only once, then copies may not be desirable. However, if the access is serial, some of the retrieval time can be overlapped with processing time. There could be a thread retrieving the data that will be needed next while the data already retrieved is being processed. This can even apply to randomly accessed data, if there is the ability to do some prediction of which portions of data will be needed next.
One of the most difficult problems with duplicating rapidly changing databases is keeping them in synchronization. The first step is to see if rapid synchronization
60 Introduction to Grid Computing
is really needed. Can the application be modified to work around this? If not, the synchronization mechanisms themselves may need to be changed. If the rapidly changing data is only a subset of the database, memory versions of the database might be considered. Network communication bandwidth into the central database repository could also be increased. Is it possible to rewrite the application so that it uses a data flow approach rather than the central state of a database? Perhaps it can use selfcontained transactions that are transmitted to where they are needed. The subjobs could use direct communications between them as the primary flow for data dependency rather than passing this data through a database first.
In some applications, various database records may need to be updated atomically or in concert with others. Locking or synchronization primitives are used to lock all of the related database entries, whether they are in the same database or not, and then are updated while the synchronization primitives keep other subjobs waiting until the update is finished. One should look for ways to minimize the number of records being updated simultaneously to reduce the contention created by the synchronization mechanism. One should exercise caution not to create situations that might cause a synchronization deadlock with two subjobs waiting for each other to unlock a resource the other needs. There are three ways that are usually used to prevent this problem:
The first is the easiest, but can be the most wasteful. This is to have all waits for resources to include timeouts. If the timeout is reached, then the operation must be undone and started over in an attempt to have better luck at completing the transaction.
The second is to lock all of the resources in a predefined order ahead of the operation. If all of the locks cannot be obtained, then any locks acquired should be released, and then, after an optional time period, another attempt should be made.
The third is to use deadlock detection software. A transitive closure of all of the waiters is computed before placing the requesting task into a wait for the resource. If it would cause a deadlock, the task is not put into a wait. The task should release its locks and try again later. If it would not cause a deadlock, the task is set to automatically wait for the desired resource.
It may be necessary to run an application redundantly for reliability reasons, for example. The application may be run simultaneously on geographically distinct parts of the grid to reduce the chances that a failure would prevent the application from completing its work or prevent it from providing a reliable service. If the application updates databases or has other data communications, it would need to be designed to tolerate redundant data activity caused by running multiple copies of the application. Otherwise, computed results may be in error.
Chapter 6. Application considerations 61
6.4 Summary
Portability and the capability to take advantage of virtual resources are key attributes of an application that can take advantage of grid computing. As grid technologies and environments advance, more and more applications will be able to take advantage of the grid.
In general, applications and their requirements should be analyzed to understand how they could be designed and developed to reap the benefits from a grid. However, in many cases today, organizations are looking to identify specific applications that they could adapt quickly to a grid environment to gain immediate benefits and to gain experience and knowledge around grid computing. This chapter has described some of the attributes of applications and data access patterns that more easily lend themselves to grid computing.
62 Introduction to Grid Computing
Chapter 7.
Security
One of the key questions that usually arises when considering a grid environment is security. This chapter describes security issues, techniques, and solutions needed to provide a robust and secure grid computing environment.
The information presented up to this point in this book has been generic and not specific to a particular grid environment. Many of the security issues and topics described here are also general in nature. However, some of our examples and discussion are made clearer when we can use a specific implementation, so this chapter provides some information that is specific to the Globus Toolkit 4. We introduce Globus Toolkit 4 in more detail in Part 3, Creating a grid environment with the Globus Toolkit 4 on page 139. Though we have included these specific examples, the general concepts and requirements apply to other environments as well.
Copyright IBM Corp. 2005. All rights reserved. 63
7
7.1 Introduction to grid security
Security requirements are fundamental to the grid design. The basic security components are comprised of mechanisms for authentication, authorization, and confidentiality of communication between grid computers. Without this functionality, the integrity and confidentiality of the data processed within the grid would be at risk. To properly secure your grid environment, there are many different tools and technologies available. This chapter examines some of those technologies.
In order to better understand grid security, it is best to start with some basic grid security requirements and security fundamentals. Grid security builds on wellknown security standards. We discuss general security requirements followed by security fundamentals. In this chapter, we discuss the nuts and bolts of grid security and the underlying technologies that allow for grid security to work.
7.1.1 Grid security requirements
A virtual organization is one of the fundamental concepts in a grid environment today. A virtual organization VO is defined as a dynamic group of individuals, groups, or organizations who define the conditions and rules business objectives and policies for sharing resources.
A grid environment is required to coordinate resource management and sharing within a VO that potentially spans multiple organizations. This implies that a grid application may span multiple administrative domains. Each of these domains would have its own business requirements and policies to adhere to. A grid security infrastructure is required to comply with local domainlevel security policies and VOdefined policies. To achieve this requirement the grid security infrastructure requires interoperability amongst various domains while maintaining a clear separation of the security policies and mechanisms deployed by both virtual and real organizations.
The Security Architecture for Open Grid Services by Nagaratnam, et. al., 2002; http:www.cs.virginia.eduhumphreyogsasecwgOGSASecArchv107192 002.pdf summarizes the following security challenges in a grid environment:
Integration
The grid security infrastructure is required to integrate with existing security infrastructures across platforms and hosting environments. The overall grid security architecture is required to be implementation agnostic and be extensible to incorporate new security services as they become available.
Interoperability
64 Introduction to Grid Computing
The Grid services that traverse multiple domains and hosting environments need to be able to interact with each other to allow domains to exchange messages for example, via SOAPHTTP, allow each party to specify security policy applied to a secure conversation, and provide mechanisms to identify a user from one domain in another domain.
Trust Relationship
A Grid service request can span multiple security domains. The security domains involved to meet a Grid service request require establishing trust with each other. Due to the dynamic nature of a grid environment, it is unfeasible to establish endtoend trust prior to execution of an application. The issue of trust establishment becomes complicated with transient Grid services.
At a high level the grid security requirements can be defined as follows:
Authentication
Delegation
Single logon
Credential life span and renewal
Authorization
Privacy
Confidentiality
Providing interfaces to plugin different authentication mechanisms and means to convey the mechanism used.
Providing mechanisms to allow delegation of access rights from requesters to services while ensuring that the access rights delegated are restricted to the tasks intended to be performed within policy restrictions.
This refers to relieving an authenticated entity from reauthentication for a certain period of time when subsequent access to grid resources are requested while taking multiple security domains and identity mappings into account.
Ability to refresh requester credentials if a grid application operation takes longer to complete than the lifespan of a delegated credential.
Ability to control access to grid components based on authorization policies.
Allowing both a service requester and a service provider to define and enforce privacy policies.
Protect confidentiality of underlying transport and message content and
Chapter 7. Security 65
Message integrity
Policy exchange
Secure logging
Assurance
Manageability
Firewall traversal
Securing the OGSA infrastructure
between OGSAcompliant components in either pointtopoint or store and forward mechanisms.
Ensuring unauthorized changes made to message content or data can be detected at the recipient end.
Allows security context negotiation mechanism between service requesters and service providers based on security policy information.
Provides a foundation for nonrepudiation and auditing that enables all services to timestamp and log various types of information without interruption or information alteration by adverse agents.
Provides means to qualify the security assurance level that can be expected of a hosting environment. The security assurance level indicates the types of security services provided by an environment. This information is useful in deciding whether to deploy a service in the environment.
This requirement mainly deals with various security service management issues such as identity management, policy management, and so on.
Ability to traverse firewalls without compromising local control of firewall policy to enable crossdomain grid computing environment.
This refers to securing core OGSA components.
The diagram below gives a highlevel view of various components of a grid security model that addresses the requirements described above.
66 Introduction to Grid Computing
Intrusion Detection
Secure Conversations
Credential and Identity Management Single Logon
Access
Control Environment
Audit and Non Repudiation
AntiVirus Management
Service EndPoint Policy
Mapping Rules
Authorisation Policy
Privacy Policy
Policy Management
Policy Expression and Exchange
User Management
Ke y Management
Binding Security Transport, Protocol, Message Security
Figure 71
Grid security model
This grid security model abstracts enterprise security services as a single model to enable organizations to utilize their existing security infrastructure to communicate with other enterprises that uses different technology.
Please refer to the The Security Architecture for Open Grid Services by Nagaratnam, et. al., for a detailed discussion of each of the components shown in the figure above.
7.1.2 Security fundamentals
Security requires three fundamental services: Authentication, authorization, and encryption. A grid resource must be authenticated before any checks can be done as to whether any requested access or operation is allowed within the grid. Once the user has been authenticated within the grid, the grid user can be granted certain rights to access a grid resource. This, however, does not prevent data in transit between grid resources from being captured, spoofed, or altered. The security service to insure that this does not happen is encryption.
The world of security has its own set of terminology. The International Organization for Standardization ISO has defined the common security services found in modern IT systems. The list was first put in ISO 74982 OSI Security Architecture and later updated in ISO 10181 OSI Security
Chapter 7. Security 67
Trust Model Secure Logging
Frameworks. To have a better understanding of security systems and services, some security terms with explanations are listed below:
Authentication
Access control
Data integrity
Data confidentiality
Key management
Authentication is the process of verifying the validity of a claimed individual and identifying who he or she is. Authentication is not limited to human beings; services, applications, and other entities may be required to authenticate also.
Assurance that each user or computer that uses the service is permitted to do what he or she asks for. The process of authorization is often used as a synonym for access control, but it also includes granting the access or rights to perform some actions based on access rights.
Data integrity assures that the data is not altered or destroyed in an unauthorized manner.
Sensitive information must not be revealed to parties that it was not meant for. Data confidentiality is often also referred to as privacy.
Key management deals with the secure generation, distribution, authentication, and storage of keys used in cryptography.
The Grid Security Infrastructure GSI provided as part of the Globus Toolkit and a Public Key Infrastructure PKI provide the technical framework including protocols, services, and standards to support grid computing with five security capabilities: User authentication, data confidentiality, data integrity, nonrepudiation, and key management.
7.1.3 Important grid security terms
During the course of this chapter, we go over many important security terms. While some of the terms covered within this section provide the background as to how grid security works, there are some important concepts that should be highlighted. This is due to the fact that some areas within grid security require a precise understanding of the security concepts. Also, some security components may work slightly differently within a grid environment as opposed to a standard network. Below are some important security concepts that you should be aware of when reading this chapter. These concepts are described in greater detail throughout the chapter.
Symmetric encryption: Using the same secret key to provide encryption and decryption of data.
68 Introduction to Grid Computing
Asymmetric encryption: Using two different keys for encryption and decryption. The public key encryption technique is the primary example of this using a public key and a private key pair.
Secure Socket LayerTransport Layer Security SSLTLS: These are essentially the same protocol. TLS has been renamed by the IETF, but they are based on the same RFC.
Public Key Infrastructure PKI: The different components, technologies, and protocols that make up a popular asymmetric encryption solution.
Mutual Authentication: Instead of using an LDAP repository to hold the public key PKI, two parties who want to communicate with one another use their public key stored in their digital certificate to authenticate with one another. This topic is covered in 7.2.2, Grid secure communication on page 82.
These are all important concepts to remember and will give you a head start in understanding how grid security works.
7.1.4 Symmetric key encryption
Symmetric key encryption is based on the use of one shared secret key to perform both the encryption and decryption of data. To ensure that the data is only read by the two parties sender and receiver, the key has to be distributed securely between the two parties and no others. If someone should gain access to the secret key that is used to encrypt the data, they would be able to decrypt the information. This form of encryption has performance benefits over asymmetric encryption, but requires additional care and administration in the handling of the shared key. As we mention in the next section, asymmetric key encryption may be used to help manage the keys when using symmetric encryption.
Chapter 7. Security 69
Figure 72 Symmetric key encryption using a shared secret key
Here are some commonly used examples of a symmetric key cryptosystem:
Data Encryption Standard DES: 56bit key plus 8 parity bits, developed by IBM in the mid1970s
Advanced Encryption Standard AES: Cryptographic keys of 128, 192, and 256 bits to encrypt and decrypt data in blocks of 128 bits
TripleDES: 112bit key plus 16 parity bits or 168bit key plus 24 parity bits that is, two to three DES keys
RC2 and RC4: Variablesized key, often 40 to 128 bits long
To summarize, secret key cryptography is fast for both the encryption and decryption processes. However, secure distribution and management of keys is difficult to guarantee.
7.1.5 Asymmetric key encryption
Another commonly used cryptography method is called public key cryptography. The RSA public key cryptography system is a prime example of this. In public key cryptography, an asymmetric key pair a socalled public key and a private key is used. The key used for encryption is different from the one used for decryption. Public key cryptography requires the key owners to protect their private keys while their public keys are not secret at all and can be made
70 Introduction to Grid Computing
available to the public. Normally, the public key is present in the digital certificate that is issued by the Certificate Authority.
The computation algorithm relating the public key and the private key is designed in such a way that an encrypted message can only be decrypted with the corresponding key of that key pair, and an encrypted message cannot be decrypted with the encryption key the key that was used for encryption. Whichever publicprivate key encrypts your data, the other key is required to decrypt the data. A message encoded with the public key, for instance, can only be decoded with the private key. One of the keys is designated as the public key because it is made available, publicly, via a trusted Certificate Authority, which guarantees the ownership of each of the public keys. The corresponding private keys are secured by the owner and never revealed to the public.
The public key system is used twice to completely secure a message between the parties. The sender first encrypts the message using his private key and then encrypts it again using the receivers public key. The receiver decrypts the message, first using his private key and then the public key of the sender. In this way, an intercepted message cannot be read by anyone else. Furthermore, any tampering with the message will make it not decrypt properly, revealing the tampering.
The asymmetric key pair is generated by a computation that starts by finding two vary large prime numbers. Even though the public key is widely distributed, it is practically impossible for computers to calculate the private key from the public key. The security is derived from the fact that it is very difficult to factor numbers exceeding hundreds of digits.
This mathematical algorithm improves security, but requires a long encryption time, especially for large amounts of data. For this reason, public key encryption is often used to securely transmit a symmetric encryption key between the two parties, and all further encryption is performed using this symmetric key.
7.1.6 The Certificate Authority
A properly implemented Certificate Authority CA has many responsibilities. These should be followed diligently to achieve good security. The primary responsibilities are:
Positively identifying entities requesting certificates
Issuing, removing, and archiving certificates
Protecting the Certificate Authority server
Maintaining a namespace of unique names for certificate owners
Serving signed certificates to those needing to authenticate entities
Logging activity
Chapter 7. Security 71
Within some PKI environments, a Registrant Authority RA works in conjunction with the CA to help perform some of these duties. The RA is responsible for approving or rejecting requests for the certificate of public keys and forwarding the user information to the CA. The RA normally has the responsibility of validating that the users information is correct before the signed digital certificate is sent back to the user. Simple CAs, such as those provided with the Globus Toolkit, can be installed for testing purposes. Within this scenario, the simple CA handles the job of both the CA and RA within the grid environment. As the number of certificates expands, these two jobs are normally separated.
One of the critical issues within a grid PKI environment is guaranteeing the systems trustworthiness. Before a CA can sign and issue certificates for others, it has to do the same thing to itself so that its identity can be represented by its own certificate. That means a CA has to do the following:
1. The CA randomly generates its own key pair. 2. The CA protects its private key.
3. The CA creates its own certificate.
4. The CA signs its certificate with its private key.
If a grid resource needs to securely communicate with another grid resource, it needs a certificate signed by a CA. The grid resource has to enroll with the CA by generating an unsigned digital certificate specifying his or her own information. The information submitted will be used by the CA to identify whether this grid resource is real and should be granted a certificate. The CA will then sign the digital certificate if the grid resource is eligible to receive the certificate. This certificate, after the CA signs the certificate, will be passed back to the requesting grid resource. So, one basic function of a CA is to create and issue certificates for a grid resource.
The CAs private key
The CAs private key is one of the most important parts in the whole public key infrastructure. It is used, for example, by the CA to sign every issued digital certificate within the grid network. Thus, it is especially susceptible to attacks from hackers. If someone were to gain access to the CAs private key, they would be able to impersonate anyone within the environment. Therefore, it is very important to protect this key. Knowing how sensitive the private key is to the rest of your grid environment, it is important to provide your CA server with any available security measures. This includes restricting physical and remote access and monitoring and auditing the server.
CA cross certification
Generally within a single grid environment, a CA will provide certificates to a fixed group of users. If two companies or virtual organizations VOs need to communicate with and trust one another, this may require that both CAs trust
72 Introduction to Grid Computing
one another or participate in cross certification. For example, Alice, an employee belonging to an organization with its own CA, may want to run a job on grid computer Mike, who is outside the organization, and who belongs to a different CA. In order to do so, the following should be considered:
Alice and Mike need a way to obtain each others public key certificates.
Mike needs to be sure that he can trust Alices CA. Alice needs to be sure that
she can trust Mikes CA.
Grid computers from different security domains or VOs will need to trust each others certificates, so the roles and relationships between CAs have to be defined. The purpose of creating such trust relationships is to eventually achieve a global, interoperable PKI and enlarge the grid infrastructure. Once the relationship is established, both of the CAs can be configured to work with the grid system.
Managing your own CA
It is important to note that the simple CA provided with the Globus Toolkit is a fully functioning CA for a PKI environment, but it is only recommended for testing or demo purposes. For a production grid environment, it is recommended that you evaluate commercial PKI solutions that may better suit your needs and remove the responsibility of managing your own CA.
7.1.7 Digital certificates
Digital certificates are digital documents that associate a grid resource with its specific public key. A certificate is a data structure containing a public key and pertinent details about the key owner. A certificate is considered to be a tamperproof electronic ID when it signed by the Certification Authority for the grid environment.
Digital certificates, also called X.509 certificates, act very much like passports: They provide a means of identifying grid resources. Unlike passports, digital certificates are used to identify grid resources. Another difference between a digital certificate and a passport is that a certificate can and should be distributed and copied without restriction, while people are normally very concerned about handing their passports to someone else. Certificates do not normally contain any confidential information, and their free distribution does not create a security risk.
The important fact to know and understand about digital certificates is that the CA certifies that the enclosed public key belong to the entity listed in the certificate. The technical implementation is such that it is considered extremely difficult to alter any part of a certificate without easy detection. The signature of the CA provides an integrity check for the digital certificate.
Chapter 7. Security 73
When a grid client wants to start a session with a grid recipient, he or she does not attach the public key to the message, but the certificate instead. The recipient receives the communication with the certificate and then checks the signature of the Certificate Authority within the certificate. If the signature was signed by a certifier that he or she trusts, the recipient can safely accept that the public key contained in the certificate is really from the sender. This prevents someone from using a fraudulent public key to impersonate the public key owner.
Contained in your digital certificate is the information about you and your public key. When you communicate with another party on the grid, the recipient will use your public key contained in your digital certificate to decrypt the SSL session ID, which is used to encrypt all data transferred between grid computers.
A digital certificate is made up of a unique distinguished name DN and certificate extensions that contain the information about the individual or host that is being certified. Some information in this section may contain the subjects email address, organizational unit, or location.
Figure 73 is a graphical depiction of the digital certificate.
Figure 73 Digital certificate
74 Introduction to Grid Computing
Obtaining a client or a server certificate from a CA involves the following steps:
1. Thegriduserrequiringcertificationgeneratesakeypairprivatekeyand certificate request containing the public key.
2. Theusersignsitsownpublickeyandanyotherinformationrequiredbythe CA. Signing the public key demonstrates that the user does, in fact, hold the private key corresponding to the public key.
3. ThesignedinformationiscommunicatedtotheCA.Theprivatekeyremains with the client and should be stored securely. For instance, the private key could be stored in an encrypted form on a Smartcard, or on the users computer.
4. TheCAverifiesthethattheuserdoesowntheprivatekeyofthepublickey presented.
5. The CA or optionally an RA needs to verify the users identity. This can be done using outofband methods, for example, through the use of email, telephone, or facetoface communication. A CA or RA can use its own record system or another organizations record system to verify the users identity.
6. Upon a positive identity check, the CA creates a certificate by signing the public key of the user, thereby associating a user to a public key. The certificate will be forwarded to the RA for distribution to the user.
Verification of the user
The authentication described above is a onetime authentication for the purpose of certificate issuance. This can be compared to the process when a government authority issues a passport to an individual. The passport then serves as an authentication mechanism when this individual travels to foreign countries. Just like passports, digital certificates can subsequently be used in daily operations for authenticating subjects to other parties that require authentication.
Different types of certificates
There are two different types of certificates that are used within a grid environment. The first type of certificate is a user certificate that will identify different users on the grid. The second type of certificate is issued to grid servers.
User
As a grid user, you will need a user certificate to identify yourself within the grid. This certificate will identify your user name within the grid, not your server or workstation name. For a user named John Doe, the digital certificate might have the distinguished name:
OGridOGridTestOUtest.domain.comCNJohn Doe
Chapter 7. Security 75
Server
If you plan on running PKIenabled programs on your server, you will need to register a server certificate. This server certificate will register the fully qualified domain name of your server to your certificate. For your certificate to work, your fully qualified DNS name will have to match your digital certificate. For example, if your a server name was goban.companyname.com, your server certificate would read:
CNServicegoban.companyname.com
PKI directory services
Within some PKI environments, the signed keys are published to a public directory for easy retrieval. Instead of having the clients handle the mutual authentication, an external server is responsible for handling the authentication process. A good example of this process is the MyProxy server, which works as a grid Web proxy for Web portals. In this example, the user would authenticate to the Web portal, which would request the users online credentials that are stored in the directory. Upon this authentication, the proxy would extract the DN within their digital certificate and match their credentials with the public key stored within the directory. If they two keys matched up, the user would be given access to resources within the grid.
7.2 Grid security infrastructure
Now that we have gone over some security fundamentals, explaining how the different grid security components interact will be much easier. In this section of the chapter, we choose to summarize the basic mechanisms used by the Grid Security Infrastructure GSI provided by the Globus Toolkit. This is just one example of an implementation of a grid security infrastructure. We describe how the different security components within the Globus Toolkit provide security services. We examine different scenarios and walk through the various functions of the GSI.
7.2.1 Getting access to the grid
In order to build a grid environment using the GSI components, you have to create a set of keys for public key cryptography and request your certificate from the Certificate Authority and a copy of the public key of the CA. Figure 74 on page 77 and the following procedure describe the steps to establish the GSI communication:
1. CopytheCertificateAuthorityspublickeytoyourgridhostwithwhichyouset up GSI.
76 Introduction to Grid Computing
2. Create your private key and a certificate request.
3. Send your certificate request to CA by email or another more secure way if you are running a production system and need to positively identify the sender.
4. CA signs your request to make your certificate and sends it back to you.
Grid Host
Certificate Authority
2 create
GSI
CAs Public Key
3
4
Your private key Your certificate
1 copy
CAs Public Key
4 sign
send
signing request
Your certificate signing request
CAs Private Key
Your certificate
send
Your certificate
Figure 74 Preparation procedure for GSI
When that procedure has been completed and you have received your signed digital certificate, you will have three important files on your grid host. They are:
The CAs public key
The grid hosts private key
The grid hosts digital certificate
In order to provide secure authentication and communication for your grid computer, you should not let others have access to your private key. An extra layer of security was added to the private key, which includes a secret passphrase that must be used when using your private key along with your digital certificate. This is to prevent someone from stealing your digital certificate and private key and being able to automatically use them to access grid resources. The host key is protected by the local operating system privileges within the grid server.
Authentication and authorization
Imagine a scenario where you need to communicate with another grid computers application and you want to ensure that the data from the host is really from the host. Besides making sure that you can trust the grid host, you want to make sure the grid host that you want to communicate with trusts your grid computer. In these cases, you can use the authentication function of GSI, as shown in Figure 75 on page 79. After you have authenticated with the remote grid resource, you then have the option of having the grid resource give you
Chapter 7. Security 77
access to resources on your behalf. In this case, you can use the authorization function of GSI.
Through the steps described below, grid host A or a user on grid host A is authenticated and authorized by grid host B. Almost all steps are for authentication, except the last authorization step:
1. AuserorapplicationonAsendsitscertificatetothehostB.
2. Host B will get the public key for A and will use it to extract the subject from the certificate.
3. Host B creates a random number and sends it to host A.
4. Host A receives the number, encrypts it with its private key, and sends the encrypted number to host B.
5. HostBwilldecryptthenumberandcheckthatthedecryptednumberisreally the one that it sent before. Then host B authenticates that the certificate is really that from the user on host A, because only that user on host A can encrypt the number with its private key.
6. ThecertificateisauthenticatedbyhostB,andthesubjectinthecertificateis mapped to a local user name. The subject is in the form of Distinguished Name DN like OGridOGlobusOUitso.grid.comCNyour name, and it is the name that is used by LDAP to distinguish the entries in the directory service. The subject is used to specify the user identity in a grid environment. The user defined by the Distinguished Name is authorized by host B to act as a local user on host B.
78 Introduction to Grid Computing
Grid Host A
Grid Host B
1 send
2
get your public key
subject
CAs public key
subject 6 user name mapping
Subject User Name
gridmapfile
4 password
3 create send random
5 decrypt random
random
Your private key
4 encrypt send
Your certificate
Your certificate
Your public key
5 identify
Figure 75 Authentication procedure
In grid environments, your host will become a client in some cases, and in other cases, a server. Therefore, your host might be required to authenticate another host and be authenticated by the host at the same time. In this case, you can use the mutual authentication function of GSI. This function is almost the same as explained above, and it proceeds with the authentication steps, and changes the direction of hosts and redoes the procedure.
Briefly speaking, authentication is the process of sharing public keys securely with each other, and authorization is the process that maps your DN to a local usergroup of a remote host.
Delegation
Imagine a situation where you distribute jobs to remote grid machines and let them distribute their child jobs to other machines under your security policy. In this situation, you can use the delegation function of GSI, as shown in Figure 76 on page 81.
If you are on the side of host A, you can create your proxy at host B to delegate your authority. This proxy acts as yourself, and submits a request to host C on your behalf.
Chapter 7. Security 79
The next steps see Proxy creation on page 80 describe the procedure to create your proxy proxy creation at a remote machine, and the procedure to submit a request see Proxy action on page 80 to the other remote host on your behalf proxy action.
Proxy creation
For proxy creation:
1. AtrustedcommunicationiscreatedbetweenhostAandhostB.
2. YourequesthostBtocreateaproxythatdelegatesyourauthority.
3. HostBcreatestherequestforyourproxycertificate,andsendsitbacktohost A.
4. Host A signs the request to create your proxy certificate using your private key and sends it back to host B.
5. Host A sends your certificate to host B.
Proxy action
For proxy action:
1. YourproxysendsyourcertificateandthecertificateofyourproxytohostC.
2. Host C gets your proxys public key through the path validation procedure:
a. Host C gets your subject and your public key from your certificate using CAs public key.
b. HostCgetstheproxyssubjectandyourproxyspublickeyfromyour proxys certificate using your public key.
c. ThesubjectisaDistinguishedNamesimilarto OGridOGlobusOUitso.grid.comCNyour name. The subject of the proxy certificate is similar to its owners your subject and is similar to OGridOGlobusOUitso.grid.comCNyour nameCNproxy. So in order to validate the proxy certificate, host C just has to check that the words that eliminate the words CNproxy from the proxys subject are just the same as your subjects. If it is validated, your proxy is authenticated by host C and able to act on your behalf.
3. Theproxyencryptsarequestmessageusingitsprivatekeyandsendsitto host C.
4. HostCdecryptstheencryptedmessageusingtheproxyspublickeyandgets the request.
5. Host C runs the request under the authority of a local user. The user is specified using a mapping file, which represents the mapping between the grid users subject and local users local user name.
80 Introduction to Grid Computing
Grid Host A
1 creatsecurecommunication 2 requesttocreateproxy
4 signproxycertificateand
Grid Host B
3 creatproxycertificaterequest
Proxy certificate request
Proxy certificate
Your certificate
Proxy private key
7
Grid Host C
path validation get proxy public key
send back password
Your private key
Proxy certificate request
Proxy certificate
Your certificate
Your certificate
Figure 76
Delegation procedure of users proxy
5
send your certificate
6
request
send your certificate and proxy certificate
8 encryptrequest send
The procedure in Figure 76 represents remote delegation, where a user creates a proxy at a remote machine. There is also a local delegation, where a user creates a proxy certificate at the local machine; for that task, Globus Toolkit uses the gridproxyinit command and gatekeeper daemon mechanism.
When you make a proxy on a remote machine in remote delegation, the proxys private key is stored on the remote machine, so the superuser of that machine can access your proxys private key. This delegated credential can be vulnerable to attacks. In order to avoid this, it is recommended that the proxy attain restricted policies from its owner. The standardization of this proxy restriction is
CAs public key
Your subject
Your public key
Proxy subject
Proxy public key
9 decrypt request
10 mappingexecution
local user Proxy name subject
Subject UserName gridmapfile
Proxy certificate
Chapter 7. Security 81
now going on under GSI Working Group of the Grid Forum Security Area, and you can see more details in its Internet draft at:
http:www.ietf.orginternetdraftsdraftietfpkixproxy03.txt
7.2.2 Grid secure communication
While we have gone over the process of using PKI within a grid environment and the different functions of GSI, it is still important to understand the communication mechanisms used within the Globus Toolkit. By default, the underlying communication is based on the mutual authentication of digital certificates and SSLTLS.
The digital certificates that have been installed on the grid computers provide the mutual authentication between the two parties. We discuss this process in detail later on in this section. The SSLTLS functions that OpenSSL provides will encrypt all data transferred between grid hosts. These two functions together provide the basic security services of authentication and confidentiality.
Mutual authentication
To allow secure communication within the grid, the OpenSSL package is installed as part of the Globus Toolkit. Within the Globus Toolkit, OpenSSL is a software package that is used to create an encrypted tunnel using SSLTSL between grid clients and servers.
The process of mutual authentication begins when two grid resources want to share resources. Instead of using a key repository, each grid resource authenticates with one another based on their digital certificate. For example, one grid resource will attempt to establish secure communication with another grid resource. Before the recipient will allow the client access to their resources, they need to authenticate to one another. This process is documented below with the SSL handshake.
SSL handshake
In order to establish the secure communication between the grid server and grid client, a handshake must be established. This handshake is responsible for determining the SSL settings, exchanging public keys and the basis for the mutual authentication process. The handshake process is as follows:
1. Agridclientcontactsaremotegridservertostartasecuresessionbyusinga digital X.509 ID certificate.
2. ThegridclientautomaticallysendstotheservertheclientsSSLversion number, cipher settings, randomly generated data, and other information the server needs to communicate with the client using SSL.
82 Introduction to Grid Computing
3. Thegridserverresponds,automaticallysendingthegridclientthesites digital certificate, along with the servers SSL version number, cipher settings, and so on.
4. Thecustomersclientexaminestheinformationcontainedintheservers certificate, and verifies that:
a. Theservercertificateisvalidandhasavaliddate.
b. TheCAthatissuedtheservercertificatehasbeensignedbyatrustedCA whose certificate is built into the client.
c. TheissuingCAspublickey,builtintotheclient,validatestheissuers digital signature.
d. Thedomainnamespecifiedbytheservercertificatematchestheservers actual domain name.
5. If the server can be successfully authenticated, the grid client generates a unique session key to encrypt all communications with the grid server using asymmetric encryption.
6. Theusersclientencryptsthesessionkeyitselfwiththeserverspublickeyso that only the site can read the session key, and sends it to the server.
7. Theserverdecryptsthesessionkeyusingitsownprivatekey.
8. Thegridclientsendsamessagetotheserverinformingitthatfuture messages from the grid client will be encrypted with the session key. The grid server then sends a message to the grid client informing it that future messages from the server will be encrypted with the session key.
9. AnSSLsecuredsessionisnowestablished.SSLthenusessymmetric encryption which is much faster than asymmetric PKI encryption to encrypt and decrypt messages within the SSLsecured pipeline.
10.Now that the first grid resources have authenticated, the second grid resource will now authenticate using the same process.
11.Once the session is complete, the session key is eliminated.
As long as both grid resources have a valid digital certificate, the process of mutual authentication will succeed. This is a good example of how grid security uses both symmetric and asymmetric encryption to authenticate and secure data transfer between grid resources. A grid client uses asymmetric encryption to authenticate, and once it is authenticated, it passes symmetric encryption along with a shared secret key to encrypt and decrypt all data traffic between them.
Other grid communication
If you cannot physically access your grid client or server, it may be necessary to gain remote access to the grid. While your operating systems default telnet
Chapter 7. Security 83
program works fine for remote access, the transmission of the data is in clear text. That means that the data transmission would be vulnerable to someone listening or sniffing the data on the network. While this vulnerability is low, it does exist and needs to be dealt with.
To secure the remote communication between a client and grid server, the use of Secure Shell SSH can be used. SSH will establish an encrypted session between your client and the grid server.
7.2.3 Grid security stepbystep
In order to better understand the process for accessing grid resources, we have outlined the basic process from start to finish.
Local delegation
This program is used to get a session proxy certificate using your longterm certificate.
The proxy certificate is used to authenticate the user and user programs to resources on the grid. For example, the user can run jobs on the grid with the globusrun command. The globusrun command is authenticated with the proxy certificate. The proxy certificate is created with the gridproxyinit command. A proxy certificate must be created before jobs can be run on the grid. The proxy certificate is a session certificate with a limited or shortlived life time, which is signed by the user certificate. This is functionally equivalent to the Kerberos kinit program or DCE dcelogin.
The motive behind this model is to provide for the single signon. The single signon is the gridproxyinit. Once the grid proxy certificate is created, this certificate is used for authentication on the grid.
This model works because it creates a certificate trust hierarchy, as shown in Figure 77 on page 85.
84 Introduction to Grid Computing
Figure 77 Authentication process
The hierarchy is as follows:
1. TheremotegridresourcetruststheCA.Theremotegridresourcetruststhe CA because it placed the CAs certificate in etcgridsecuritycertificates.
2. Theremotegridresourcecanauthenticatetheusercertificatebecauseitis digitally signed by the CA.
3. Theremotegridresourcecanauthenticatetheuserproxycertificatebecause it is digitally signed by the user certificate.
Chapter 7. Security 85
It is analogous to meeting three people at a party: CA, Alice, and Proxy. Proxy hands you a card that is similar to Figure 78.
Figure 78 Certificate signed by Alice
You are not familiar with Alices signature, so you take a card from Alice, which is similar to Figure 79.
Figure 79 Certificate signed by CA
86 Introduction to Grid Computing
You keep a copy of CAs signature in you wallet. You compare the CA signature on Alices card to the copy you keep in your wallet and they match. You now have an authenticated copy of Alices signature, which you compare to the signature on Proxys card. They match, and you now trust you that you are talking to Proxy. You have authenticated that this person is Proxy.
The gridproxyinit command uses the SSL library to create a proxy certificate that is stored in tmpfilename, where filename is equal to x509upuuid, where uid is equal to the UID of the user running gridproxyinit. The permission of this file is rw owner group of the user running the command.
This file is an X.509 certificate where the issuer is the users primary certificate. Basically, the users primary certificate acts like a CA to create this session or proxy certificate. The proxy certificate is considered a shortlived certificate. By default, it has a validity period of 12 hours, but this can be specified by the gridproxyinit parameter hours.
The proxy certificate, as with all X.509 certificates, contains a unique name and public key. The proxy certificates unique subject name or distinguished name is the primary certificates unique name plus CNproxy limited proxy. This is best illustrated with the gridcertinfo and gridproxyinfo commands. If the gridcertinfo command is run with the file name of our primary certificate, the contents of the certificate are displayed:
gridcertinfo f.globususercert.pem subject
CUSOIBMOUGridLPPOUaustin.ibm.comCNgriduser
The subject flag displays the subject or distinguished name DN.
A complete description of X.509 certificates can be found in RFC 2459. The tmpx509upuuid file created by gridproxyinit contains two other components in addition to the proxy certificate. It also contains the private key of the proxy certificate and the user certificate.
The proxy certificates private key is only protected by the file permissions of tmpx509upuid. Since the proxy certificate is short lived, a compromised or stolen certificate will become useless at the end of its life.
The user certificates private key remains encrypted in the HOME.globususerkey.pem file. It can only be accessed with the passphrase that is given when the user certificate is created with the gridcertrequest command.
Chapter 7. Security 87
7.3 Grid infrastructure security
Apart from the different GSI components and technologies, there are many other infrastructure security components that are needed to secure the grid. As in other areas of grid design, the grid infrastructure security builds on other security principles. While these security components are optional, they are considered standard within many production networks. We explore some of these basic security concepts and see how they fit into a grid infrastructure.
7.3.1 Physical security
Once again, the security of grid infrastructure is based on other common security fundamentals. The basics involve solid physical security practices for all grid computers. The physical environment of a system is also considered a part of the infrastructure. If the servers are kept in an open room, no matter how secure the applications are designed or how complex the cryptographic algorithms are, the server services can easily be interrupted, such as being powered off, or otherwise tampered with. Therefore, physical access should be controlled and is part of the security policies that need to be defined.
The CA server should be located in a robust, dedicated, and locked room. All accesses should be logged and controlled. The power supply to the servers should never be interrupted. This means an uninterruptable power supply UPS must be used. However, a UPS may still run out of electricity after a prolonged period. In such a case, the servers should be able to automatically back up the data and properly shut down.
For maximum security, the network segment where the PKIsensitive server machines are installed should be physically and logically separated from the rest of the network. Ideally, the separation is done through a firewall that is transparent only for PKIrelated traffic. Normally, PKI traffic is reduced to using only a few TCPIP ports.
7.3.2 Operating system security
A review of the configuration files for each operating system and middleware component within the scope of the project determines how each effectively allows authorized users access based on your security policy and prevents and detects unauthorized access attempts at all times. You should:
Remove any unnecessary processes from the servers. If the grid server does not need sendmail or an FTP server running, these processes should be disabled.
Remove any unnecessary users or groups.
88 Introduction to Grid Computing
Use strong passwords for all users on the grid server.
Update your server with the latest updates and security FixPacks. This
includes all software the has been installed as well.
Restrict access to directories that contain securityrelated information, such as the .globus directory in a Globus Toolkit environment.
Consider using host IDS to monitor important directories on the server.
Enable logging and auditing for the server.
Use a uniform operating system build whenever possible.
Enable filelevel restrictions on important files within the server.
Make periodic reviews of the operating system every other month to ensure that nothing major has changed.
Enable antivirus protection. 7.3.3 Grid and firewalls
Firewalls can be used within a networked environment to logically separate different sets of computers that require additional security. In a grid environment, this is no different. The use of firewalls within a grid design helps restrict network access to computers. The firewall is an important piece of the security infrastructure, so it needs to be carefully analyzed and understood before it is implemented.
7.3.4 Host intrusion detection
A recommended option for further securing your grid computers is to invest in a host intrusion detection IDS product. As with any software application that stores important files within the local workstation, host intrusion detection can add a greater defense for anyone manipulating files on the workstation that should not be doing so. If the host IDS product detects a changed file on the server, it can send an alert to a central monitoring workstation to log and alert the necessary people.
An intrusion detection system gathers and analyzes information from various areas within a computer or a network to identify possible security breaches, which include both intrusions attacks from outside the organization and misuse attacks from within the organization. An intrusion detection system uses vulnerability assessment sometimes referred to as scanning, which is a technology developed to assess the security of a computer system or network.
Intrusion detection functions include:
Monitoring and analyzing both user and system activities
Chapter 7. Security 89
Analyzing system configurations and vulnerabilities
Assessing system and file integrity
Ability to recognize typical patterns of attacks
Analysis of abnormal activity patterns
Tracking user policy violations
Network intrusion detection
There can be a point made for network IDS within a grid environment, but some of that benefit would be lost due to the encryption between grid servers. While a network IDS would be able to use special signatures for standardized network traffic, the introduction of a networkbased IDS system would be lost because of the SSLTLS encryption. While a network IDS system could not see the data payload portion of the packet that is encrypted, the network IDS could respond to events based on the packet header that is unencrypted. Network IDS is best suited for placement where it can analyze unencrypted traffic.
The use of any IDS is an optional component within an architecture, but is strongly recommended for good security practices.
7.4 PKI security policies and procedures
Good security policies and procedures are used to complement the variety of security components that make up a security infrastructure. This is no different in a grid environment, but may take on more importance since you may be dealing with networks out of your control. To help manage this risk, different policies and procedures should be used. These policies and procedures will help build a certain way of managing the security controls.
One of the first steps an organization has to consider when comprehensive security solutions are to be introduced is to define a feasible set of security policies. In the first place, this has little to do with a PKI because security policies need to be in place for any kind of IT infrastructure. Only when the deployment of a PKI has been decided do some additional benefits and issues come up that need to be defined within security policies. The following subsections discuss security policies that primarily relate to a PKI.
7.4.1 Certificate Authority
A PKI must be operated in accordance with defined policies. The deployment of a PKI system in an organization requires the development of security policies and processes for that organization. The demo CA that is provided within the Globus Toolkit provides the software needed in order to build a CA, but unfortunately none of the policies. In this section, we examine some of the basic
90 Introduction to Grid Computing
policies and expectations that a CA would normally be responsible for. For any type of production CA duties, it is suggested that you examine a commercial vendor to provide these services for you.
The standardization effort has been made to involve security policies in a PKI framework systematically, as outlined in RFC 2527, Internet X.509 Public Key Infrastructure Certificate Policy and Certification Practices Framework. According to X.509, a certificate policy is a named set of rules that indicates the applicability of a certificate to a particular community andor class of application with common security requirements. A more detailed description of the practices followed by a CA in issuing and otherwise managing certificates may be contained in a Certification Practice Statement CPS published or referenced by a CA.
A certificate policys extension contains a sequence of one or more policy information terms, each of which consists of a registered object ID OID and optional qualifiers. Applications with specific policy requirements will have to recognize the OID meaning in at least the same security domain. If the required policys OID is not contained in the certificate extension field, or if any existing critical OIDs are not understood by the application, the application has to reject the clients request. Security policies also result in processes that have to be in place and subsequently enforced. Processes describe andor mandate the way an infrastructure is utilized by its administrators and users. Processes may include elements, such as:
The certificate requesting, issuance, distribution, and revocation processes.
The use of certificates for client authentication
The use of certificates for securing email communication
The use of certificates for interorganization communication
Procedures to follow when security violations are suspected
Handling guidelines for private keys and certificates
Application development guidelines for PKI exploitation such as user authentication using certificates
A PKI will alter many existing business processes and require many new ones to support it. These processes can cover technical, organizational, legal, and infrastructure elements of the whole workflow.
CA key generation
Who is involved?
How is the process secured?
CA key backup
How is a backup of the CA private key accomplished?
Chapter 7. Security 91
CA key restore
How is a key restored?
CA key compromise
What happens if the key is broken?
User registration
How does a user obtain a certificate?
Certificate revocation
How is a certificate revoked?
CA implementation
If you are planning on implementing your own Certificate Authority, you will likely build on tools similar to those provided with the Globus Toolkit. The Globus Toolkit provides some of the basic tools for a demo CA within a lab or testing environment, but there is more to building a CA than installing a few scripts.
In order to manage and administer your own CA, you should be aware of some of the other resources and policies that are normally required. If you plan on managing a CA yourself, your plan for implementation must include:
Required resources and skills
Required PKI and security process additions and changes
Recommended implementation time line and dependencies
Required changes to the technical infrastructure
Adoption of the CPS, certificate, and security policies
Required PKI and security policy additions and changes
All required checkpoints and approvals
7.4.2 Security controls review
When building any new environment or implementing a new software application, it is always a good idea to perform a security health check. A security health check will help determine how these new changes will affect the overall security of the environment and any other areas of change. This can help provide guidance on the overall use of security controls or how you are managing security within your environment. A review of your security controls can help you better understand how security works for your passwords, administration, toolsets, auditing, and monitoring within your environment. This will provide an indepth review of the site security controls in place and the related processes used within the organization.
92 Introduction to Grid Computing
7.5 Summary
This chapter has described in some detail the types of considerations involved in security related to grid environments. For specific examples, we have used the Globus Toolkit 4 environment and the PKI infrastructure that it delivers.
Chapter 7. Security 93
94 Introduction to Grid Computing
Chapter 8.
Design
This chapter provides architectural design considerations for grid computing. Other design topics that will be discussed are different grid topologies, grid infrastructure design, and grid architecture models.
At a glance, the following topics are discussed:
Grid architecture design concepts
Different grid topologies
Grid architecture models
Building a grid architecture
Grid architecture conceptual model
Copyright IBM Corp. 2005. All rights reserved. 95
8
8.1 Building a grid architecture
The foundation of a grid solution design is typically built upon an existing infrastructure investment. However, a grid solution does not come to fruition by simply installing software to allocate resources on demand. Given that grid solutions are adaptable to meet the needs of various business problems, differing types of grids are designed to meet specific usage requirements and constraints. Additionally, differing topologies are designed to meet varying geographical constraints and network connectivity requirements. The success of a grid solution is heavily dependant on the amount of thought the IT architect puts into the solution design.
Once the functional and nonfunctional requirements are known, the IT architect should readily be able to select the type of grid and the best topology required to satisfy the majority of the business requirements. When armed with this information, the highlevel grid design will be easier to complete, and by leveraging the use of known grid types and topologies, articulating the solution design will require much less effort.
It is important to focus on starting small and to begin building the basic framework of the design. Rather than setting out to build the desired end state grid solution all at once, consider building the grid solution in a phased approach. The milestone for the initial phase is to provide an intragrid solution, which is essentially a grid sandbox that supports a basic set of Grid services. This solution would support a single location built upon the core grid components, such as a security model, information services, workload management, and the host devices. As long as this model supports the same protocols and standards, this design can be expanded as needed.
The first step of the design process is to build a graphical representation of the grid components. The subsequent phases of the design will be focused on the next level of architecture. This phase of the design is a starting point for architects, technical managers, and executives to understand the overall structure of the architecture.
At a glance, the grid architecture design should offer the following:
The blueprint for the detailed conceptual design
The use of open standards prescribed by the grid framework
A multidimensional tiered and layered view of the grid infrastructure, which demonstrates the ability to logically partition grid resources so that their service consumption does not impact other grid locations
The middleware components and subsystems for a grid infrastructure integration
96 Introduction to Grid Computing
A design for communication to both business and technical personnel, for budget and planning purposes, and to provide application development an illustration of how the shared grid infrastructure will impact the middleware solution design
The distribution of applications and subsystems
A means for identifying the necessary technical, infrastructural, and other
middleware components and subsystems for a grid infrastructure
8.1.1 Solution objectives
The design objectives provide a basic framework for building the grid infrastructure. The advantage of using design solution objectives is to start documenting certain areas that can affect the overall design. Within your design, you are going to need to make sure that the grid can provide a certain amount of security, availability, and performance. By documenting these different objectives or requirements, it will make your design a lot easier to follow. You will also be able to justify some of your decisions during the course of the design by being able to come back to certain objectives and making sure they were met.
Once the design objectives have been defined, you can separate them into individual subsystems. This allows each design objective to be worked on in parallel, while at the same time providing a cohesiveness for the overall architecture. Once you have documented the core subsystems of the design, you can focus on the different requirements that your grid design will comprise.
When you start building the initial pieces of your design, you need to make sure that your solution objectives line up with the customers requirements. For a grid design, this is especially important, as there are not only the standard infrastructure components to consider, but specialized middleware and application integration issues as well. Making sure that your solution objectives satisfy your stated requirements will allow you to design a working grid.
Security
Within any networked environment, there is going to be some risk and exposure involved with the security of your infrastructure. Unless the computers are unplugged in a locked room, there is the potential that someone may bypass the security and get access to protected resources. Whether the weaknesses are exploited in the infrastructure, application, configuration, or administration, there is some level of risk.
Security objectives are put in place to help to reduce that risk to an acceptable level. While no design is 100 percent secure, the level of risk is reduced and controlled through the use of security controls. The goal of the security objectives
Chapter 8. Design 97
are to examine the security requirements and implement the necessary tools and processes to reduce the risk involved.
The degree of security involved is based on the type of grid topology and the data the security will be protecting. The security requirements for a grid design within a bank will be completely different from those of an academic institution doing research. Whatever the security requirements may be, the security design objectives for the grid design need to be a central focus for the conceptual architecture.
Considering that the basic grid security model is based on PKI, it is imperative that the security components are designed and thought out carefully. While PKI has been around for a while, there are different components and necessary processes that should be identified. Rushing this process could lead to many problems in the future.
With the PKI architecture being the focus of the initial design, there are still areas that need attention. The infrastructure components firewalls, IDS, antivirus, and encryption and the processes to manage these pieces are all part of the security objectives. Knowing which areas match up with your existing environment is the first step to robust security. The following bullet points are an example of some security questions that will be answered during the course of the design. The first three assume that the enterprise will provide its own certificate authority, which is not usually recommended:
Where will my CA be deployed and how will we manage it?
Do I have the necessary processes in place to administer my own CA?
What are the responsibilities for managing my own CA?
How will I administer security on the local servers?
Are my servers of a uniform build or common operating environment?
Do I have a consistent software build across critical grid infrastructure systems?
Which processes are running on my servers?
Will any existing applications conflict with or further expose my grid to any
vulnerabilities?
Availability
Availability in its simplest terms commonly refers to the percentage of time that a site is up and servicing job requests. Determining how much availability should be built into the design is part of the availability objectives. This leads down the path of discovering how many potential single points of failure exist and how much redundancy should be built into the design. It is inevitable that some
98 Introduction to Grid Computing
components will fail during a lifetime of usage, but this can be managed by using redundant components where possible.
Whenever you review various availability scenarios, there are always discussions about the amount of availability that is required. In this respect, a grid design is no different from any other infrastructure. A good start is to list the potential components within the design that should be resilient to failure. Once these components have been identified, you can seek out the specific availability options for those components. In the following examples, some different infrastructure options are described.
An important point that needs to be discussed is the availability of dynamic resources within a grid environment. Grid is not like a standard environment where resources are fixed and do not change regularly. Within grid environments, resources are constantly changing according to the membership and participation in the grid. When grid resources are active, they can register with information services within the grid to alert the system of their state. It is important to make sure that when you design your grid, you keep this in mind.
Besides the grid middleware components, the different infrastructure components will also require different levels of availability. Some components will be more critical than others, and it will be up to your design to make sure that you account for this. When going through the different availability requirements, make sure that you account for both the grid and infrastructure components. The following lists are some examples of availability resources that should be accounted for:
Grid middleware
Workload management
Grid directory and indexing service
Security services
Data storage
Grid software clustering
Networks
Loadbalancing
Highavailability routing protocols
Redundant and diverse network paths
Security
Redundant firewalls
Datastore
Mirroring
Data replication
Parallel processing
Chapter 8. Design 99
Systems management
Backup and recovery
LDAP replicas
Alerts and monitoring to signal a failure within the environment
Every so often, different components necessary to the workflow process fail periodically and disrupt availability of the system. You can help mitigate the risk involved by eliminating the single points of failure within your environment through the use of redundant software or hardware components.
To give you a better idea of some different availability targets, the following list presents an example of the expected system availability in a whole year:
Normal commercial availability single node: 9999.5 percent, 87.643.8 hours of system down
High availability: 99.9 percent, 8.8 hours of system down
Fault resilient: 99.99 percent, 53 minutes of system down
Fault tolerant: 99.999 percent, 5 minutes of system down
Continuous processing: 100 percent, 0 minutes of system down
Keep in mind, however, that the redundancy that is added to the grid infrastructure will normally increase the costs within the infrastructure. It is up to the business to help justify the costs that would bring an environment from 99.9 percent availability per year up to 99.99 percent per year. While the difference in time between those two numbers is about eight hours, the costs associated may be too much to justify the increased availability.
Performance
The performance objective for a grid environment is to most efficiently utilize the various resources within the grid. Whether that includes spare CPU cycles, access to a federated databases, or application processing, it is up to you to match the performance goals of the business and design accordingly.
If your application can take advantage of multiple resources, you can design your grid to be broken up into smaller instances and have the work distributed throughout the grid. The goal is to take advantage of the grid as a whole in order to increase the performance of the application. Through intelligent workload management and scheduling, your application can take advantage of whatever resources within the grid are available. Part of the performance is based on the form of workload management to make sure that all resources within the grid are actively servicing jobs or requests within the grid.
100 Introduction to Grid Computing
8.2 Grid architecture models
There are different types of grid architectures to fit different types of business problems. Some grids are designed to take advantage of extra processing resources, whereas some grid architectures are designed to support collaboration between various organizations.
The type of grid selected is based primarily on the business problem that is being solved. Taking the goals of the business into consideration will help you choose the proper type of grid framework. A business that wants to tap into unused resources for calculating risk analysis within their corporate data center will have a much different design than a company that wants to open their distributed network to create a federated database with one or two of their main suppliers. Such different types of grid applications will require different designs, based on their respective unique requirements.
The selection of a specific grid type will have a direct impact on the grid solution design. Additionally, it should be mentioned that grid technologies are still evolving and tactical modifications to a grid reference architecture may be required to satisfy a particular business requirement.
8.2.1 Computational grid
A computational grid aggregates the processing power from a distributed collection of systems. A wellknown example of a computational grid is the SETIhome grid. This type of grid is primarily comprised of lowpowered computers with minimal application logic awareness and minimal storage capacity.
Rather than simply painting images of flying toasters, the idle cycles of the personal computers on the SETIhome grid are combined to create a computational grid used to analyze radio transmissions received from outer space in the Search for Extra Terrestrial Intelligence.
Most businesses interested in computational grids will likely have similar IT initiatives in common. While they probably will not want to search for extraterrestrials, there will likely be a business initiative to expand abilities and maximize the computer utilization of existing resources through aggregation and sharing. The business may require more computer capacity than is available. The business is interested in modifying specific vertical applications for parallel computing opportunities.
Additional uses for a computational grid include mathematical equations, derivatives, pricing, portfolio valuation, and simulation especially risk measurement. Note that not all algorithms are able to leverage parallel
Chapter 8. Design 101
processing, data intensive and high throughput computing, order and transaction processing, market information dissemination, and enterprise risk management. In many cases, the grid architecture model is not yet suitable for realtime applications.
Computational grids can be recognized by these primary characteristics:
Made up of clusters of clusters
Enables CPU scavenging to better utilize resources
Provides the computational power to process largescale jobs
Satisfies the business requirement for instant access to resources on demand
The primary benefits of computational grids are a reduced Total Cost of Ownership TCO and shorter deployment life cycles. Besides the SETIhome grid, the World Community GridTM, the Distributed Terascale Facility TeraGrid, and the UK and Netherlands grids are all different examples of deployed computational grids. The next generation of computational grid computing will shift focus towards solving realtime computational problems.
8.2.2 Data grid
While computational grids are more suited for aggregating resources, data grids focus on providing secure access to distributed, heterogeneous pools of data. Through collaboration, data grids can also include resources such as a federated database. Within a federated database, as illustrated in Figure 81 on page 103, a data grid makes a group of databases available that function as a single virtual database. Through this single interface, the federated database provides a single query point, data modeling, and data consistency.
Data grids also harness data, storage, and network resources located in distinct administrative domains, respect local and global policies governing how data can be used, schedule resources efficiently again subject to local and global constraints, and provide high speed and reliable access to data. Businesses interested in data grids typically have IT initiatives to expand datamining abilities while maximizing the utilization of an existing storage infrastructure investment, and to reduce the complexity of data management.
102 Introduction to Grid Computing
Federated DBMS Architecture
Pluggable, wrappered data sources
DB2
Oracle
Oracle
Storage Tank Infrastructure
Grid provider Firewall 1
Public Network
SOAP
over HTTPS
Grid Client Client Proxy
Grid provider Firewall 2
Grid Services
Web Services Portal
Web Services Gateway
JDBC, ODBC, etc
Documentum
Client Firewall
Federated DBMS
Figure 81 Federated DBMS architecture
8.3 Grid topologies
A topology view see Figure 82 on page 104 covers the following spectrum of grids:
Intragrids
Single organizations
No partner integration
A single cluster
Extragrids
Multiple organizations
Partner integration
Multiple clusters
Intergrids
Many organizations
Multiple partners
Many multiple clusters
Chapter 8. Design 103
Intragrid
Extragrid
Intergrid
Figure 82 Intragrids, extragrids, and intergrids
The simplest of the three topologies is the intragrid, which is comprised merely of a basic set of Grid services within a single organization. The complexity of the grid design is proportionate to the number of organizations that the grid is designed to support, and the geographical parameters and constraints. As more organizations join the grid, the nonfunctional or operational requirements for security, directory services, availability, and performance become more complex.
As more organizations require access to grid resources, the requirements for increased application layer security, directory services integration, higher availability, and capacity become more complicated.
The resource sharing alluded to is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problemsolving and resourcebrokering strategies emerging in industry, science, and engineering. This sharing is, necessarily, highly protected, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs.
8.3.1 Intragrid
A typical intragrid topology, as illustrated in Figure 83 on page 105, exists within a single organization, providing a basic set of Grid services. The single organization could be made up of a number of computers that share a common security domain, and share data internally on a private network. The primary
104 Introduction to Grid Computing
characteristics of an intragrid are a single security provider, bandwidth on the private network is high and always available, and there is a single environment within a single network. Within an intragrid, it is easier to design and operate computational and data grids. An intragrid provides a relatively static set of computing resources and the ability to easily share data between grid systems. The business might deem an intragrid appropriate if the business has an initiative to gain economies of scale on internal job management, or wants to start exploring the use of a grid internally first by enabling vertical enterprise applications.
Figure 83 An intragrid
8.3.2 Extragrid
Based on a single organization, the extragrid expands on the concept by bringing together two or more intragrids. An extragrid, as illustrated in Figure 84 on page 106, typically involves more than one security provider, and the level of management complexity increases. The primary characteristics of an extragrid are dispersed security, multiple organizations, and remoteWAN connectivity. Within an extragrid, the resources become more dynamic and your grid needs to be more reactive to failed resources and failed components. The design becomes more complicated and information services become relevant to ensure that grid resources have access to workload management at run time.
Chapter 8. Design 105
A business would benefit from an extragrid if there was a business initiative to integrate with external trusted business partners. An extragrid could also be used in a B2B capacity andor to establish relationships of trust.
Figure 84 Extragrids can exist in several organizations and security providers
8.3.3 Intergrid
An intergrid requires the dynamic integration of applications, resources, and services with patterns, customers, and any other authorized organizations that will obtain access to the grid via the internetWAN. An intergrid topology, as illustrated in Figure 85 on page 107, is primarily used by engineering firms, life science industries, manufacturers, and by businesses in the financial industry. The primary characteristics of an intergrid include dispersed security, multiple organizations, and remoteWAN connectivity. The data in an intergrid is global public data, and applications both vertical and horizontal must be modified for a global audience. A business may deem an intergrid necessary if there is a need for peertopeer computing, a collaborative computing community, or simplified endtoend processes with the organizations that will use the intergrid.
106 Introduction to Grid Computing
Figure 85 Intergrid
8.3.4 eUtilities
One other type of grid that we should discuss before closing out this section is what we will call eutility computing. Instead of having to buy and maintain the latest and best hardware and software, with this type of grid, customers will have the flexibility of tapping into computing power and programs as needed, just as they do gas or electricity. But enterprises are coming more and more to see the esourcing trend as a continuumreaching beyond commonplace IT resources on demand to the delivery of business process and management functions integral to the way the organization works.
The esourcing business model is based on providing the components of IT function that are largely standardized and delivered through a service provider model. The attributes of this model include a distributed and shared environment, and generally standardized noncore business processes. The eutility is used by consumers of the eutility as building blocks for developing complex ebusiness solutions. The major properties of esourcing environments are a standard solution that requires minimal configuration; pooled resources used to serve
Chapter 8. Design 107
multiple customers; capacity on demand; and scalable, 24×7, always on, high availability, rapidly deployable, minimal operations overhead; shared systems management; and flexible pricing and billing based on either actual usageconsumption of resources, or a calculated flat rate subscription.
8.4 Phases and activities
Deciding which grid type and topology to chose from is just the first step in the grid architecture design. A mature endtoend design methodology is comprised of distinct phases and activities. The activities in the architecture design phase of the project include a review of the detailed architectural decisions and design documentation for the current infrastructure, conducting interviews and workshops, the modification of the initial highlevel design based on new requirements and the results of the detailed assessment, the creation of a detailed modular architecture design, and the creation of the implementation and transition plan.
8.4.1 Basic methodology
For building a grid architecture, using a basic methodology allows the design to follow a consistent path from beginning to end. A methodology is not a cookbook for building a grid architecture, but a way to trace the progress of the design from the kickoff meeting to the final end state. The methodology follows a reproducible set of guidelines that can be used over again based on a set of successful guiding principals for architecture design. A methodology allows the architecture to follow a set of principals that can be documented from beginning to end throughout the design.
We define one such basic design methodology for developing the grid conceptual architecture in the next three sections.
Understanding the business drivers
The first step of any design is to identify and document the business drivers that are the foundation behind building the grid. The business drivers outline the investment and what the end state will accomplish. The business drivers or business strategy is the foundation or reasoning behind building the grid. Whether the goal is to tie together or build a federated database with your suppliers or tie together a set of computers to harness their overall processing power, you should have an end goal in mind before the design begins.
108 Introduction to Grid Computing
Requirements gathering
The requirements gathering process will help drive the architecture process by helping the technical team work within a set of guidelines for the architecture. By following this process, all of your decisions can be tied back to the basic requirements and business drivers for the design. Along with your solution objectives, the requirements will offer a road map for you to follow work through the design phases.
Business requirements
The business requirements are a subset of the business drivers that are focused on solving a specific business need. The business requirements drive important areas within the design, such as the performance and availability of the environment. Helping to understand these key service levels is an important part of the design.
Infrastructure requirements
The infrastructure requirements provide the basic framework for how the infrastructure will be designed. There are many different variables for how the grid architecture can be designed and, based on what the requirements will be, will shape how the environment will look.
Application requirements
There are many factors that need to be accounted for during the design, and the application is one of them. Possibly one of the most important requirements that must be validated is to ensure that the application in question can be made gridaware. Unless the application can take advantage of the grid resources or split the workload across multiple components, the power of the grid is wasted.
Validate requirements
During the course of some designs, the requirements can change at the last minute or may go undiscovered. Requirements also have a way of changing when you least expect them to, so it is always a good idea to validate them before you proceed. Validating the requirements one last time before the design phase begins is a good way to ensure that all parties agree with the direction of the design.
8.4.2 Recommended steps
The following sections deal with additional recommended methods for developing an optimal grid design. These methods include attending grid design workshops and building prototypes once the design has been completed.
Chapter 8. Design 109
Grid design workshops
The purpose of the grid design workshops is to help all of the parties involved to better understand the variables, options, and considerations that need to be taken into account when developing a grid infrastructure design. Many or most of the grid middleware, technologies, and system components are probably new to many people within the design team and it is always a good idea to hear firsthand from experienced IT professionals the means by which grid infrastructures can be implemented, as well as any pitfalls to watch out for when designing environments for grid computing.
Documentation
An extremely critical means of communicating the design your solution of your grid infrastructure is via an architecture or solution document. The solution document should start with a highlevel overview of the environment and subsequently should drill down into the most detailed configuration diagrams and descriptions possible. You will want to include things like IP addresses, network routes, server names, server architectures, network hardware, and essentially everything you know about the infrastructure at the time your design is completed. In truth, architecture documents are often dynamic, changing as the needs of the system users change and as technologies mature, become obsolete, and are replaced by newer technologies. You should revise your architecture document upon further hardware and software updates so that it accurately reflects the state of the system. Without an accurate architecture document, the system implementation team may get easily confused and not produce the system that was originally designed. Additionally, anyone adding further design changes to the system after the original system architect has moved on will appreciate an uptodate architecture document, as it will save him or her countless hours of information gathering that would be necessary without an architecture document.
Prototype
Building a prototype of a grid system can save significant time that would otherwise be spent debugging and retooling unforeseen system incompatibilities. Your goal in building a prototype should be to produce a smallscale, endtoend backbone of what your production environment will look like. It should include all interoperating technologies andor architectures, so that if any incompatibility exists, it will be apparent before the production system is implemented. When all of the kinks are ironed out of your prototype, you will be confident that all of your components will work together properly in your designed infrastructure, and, additionally, you will have some experience in the implementation of such a system. Lessons learned from building the prototype should be reflected in your architecture document and any other directions provided to the implementation team.
110 Introduction to Grid Computing
8.5 A conceptual architecture
The purpose of the grid conceptual architecture is to establish a common understanding between the business owners and the people architecting and designing the grid infrastructure by describing the grid architecture that will support the client business requirements.
This section highlights some of the common components that you can choose from within the Globus Toolkit. If you are designing a grid architecture using different grid middleware software from Platform, DataSynapse, Avaki, or any other grid software provider, this section should still give you a head start on grid architecture. You will still be faced with decisions on the basic components, such as the security models, workload management, information services, and data sharing.
The conceptual model is a highlevel framework consisting of the grid system components and nodes within the design. The nodes represent the different system components and grid middleware that make up the design. Normally, the conceptual model is the first graphical view of the grid infrastructure and is used as a steppingstone to building a detailed configuration for the grid network. The graphic depiction of the grid environment will allow you to see how the requirements were gathered and how the many grid components will interact with one another.
8.5.1 Infrastructure
The infrastructure represents the physical hardware and software components used to interconnect different grid computers. These components help support the flow of information between grid systems and provide the basic set of services for connectivity, security, performance availability, and management. While many of these infrastructure components supply basic functionality to the grid, many are optional. It will be up to you to decide on the requirements and how well these components match up to the needs of your design.
Security
Chapter 7, Security on page 63, provides details about considerations related to security in a grid environment. Please refer to that chapter for more details on security.
One issue not addressed in detail in the chapter referenced above is the used of firewalls. The use of firewalls can provide logical and secure segmentation between grid systems. You might want to use firewalls to protect your networks and grid servers by limiting the types of services and protocols that connect to your computers. By using firewalls within your grid design, you can help limit the
Chapter 8. Design 111
network communication between grid systems and only use protocols that you specify that the firewall will support.
Firewalls are not the only answer to protecting your grid servers, but they do add an additional layer of defense from internal or external users trying to access your systems. Firewalls work by controlling access to network services that your grid computers will be running. Since the network offers a gateway to your grid systems, you want to make sure that you control exactly the services and protocols that can be used to access your systems, as well as who can initiate communications.
For the most uptodate information regarding the Globus Toolkit and firewalls, you should check out the firewall section on the Globus Web site at:
http:www.globus.orgsecurity
Some areas you may want to protect within your design are:
Certificate AuthorityRegistrant Authority
Globus Toolkit components, such as MDS, GRIS, and GIIS For more information about these and other Globus Toolkit components, refer to 7.2, Components of Globus Toolkit on page 133.
Databases
All grid servers
Networks
The network design within the grid architecture can take on many different shapes. The networking components can represent the LAN or campus connectivity or even WAN communication between the grid networks. Whatever the case may be, the networks responsibility is to provide adequate bandwidth for any of the grid systems. Like many other components within the infrastructure, the networking can be customized to provide higher levels of availability, performance, or security.
Grid systems are for the most part network intensive due to security and other architectural limitations. For data grids in particular, which may have storage resources spread across the enterprise network, an infrastructure that is designed to handle a significant network load is critical to ensuring adequate performance.
Systems management
Any design will require a basic set of systems management tools to help determine availability and performance within the grid. A design without these tools is limited in how much support and information can be given about the health of the grid infrastructure. Some networks within a grid architecture can be
112 Introduction to Grid Computing
dedicated to perform these functions as to not hamper the performance of the grid.
Storage
The storage possibilities are endless within a grid design. How that storage will be secured, backed up, managed, and replicated are some of the questions that the grid design will try to answer. Within a grid design, you want to make sure that your data is always available to the resources that need it. Besides availability, you want to make sure that your data is properly secured, as you would not want unauthorized access to sensitive data. Lastly, you want more than decent performance for access to your data. Obviously, some of this relies on the bandwidth and distance to the data, but you will not want any IO problems to slow down your grid applications. For applications that are more diskintensive, or for a data grid, more emphasis can be placed on storage resources, such as those providing higher capacity, redundancy, or faulttolerance.
8.6 Summary
This chapter provided an overview of some of the key criteria and general methodologies that should be considered when designing a grid computing environment.
Chapter 8. Design 113
114 Introduction to Grid Computing
Chapter 9.
Web services resource framework
A grid computing environment consists of a set of resources that are being shared, possibly across organizations. A dynamic collection of individuals, institutions, and resources is also known as a virtual organization.
This concern for resource sharing sets a grid computing environment apart from a traditional distributed computing environment. Traditionally, objectoriented distributed systems do not deal with resource sharing and management issues. A grid computing environment is essentially a distributed computing environment that also deals with heterogeneous resource sharing and management.
The sharing of a resource could range from simple file transfers to complex and collaborative problem solving. A resource can potentially be any IT infrastructure component such as software application, database, cluster, network capacity, software licence, storage, and so on.
The resource sharing is required to occur under the control of a welldefined set of conditions and policies. In this context the key issues associated with resource sharing include discovery, authentication, authorization, and access mechanisms.
The resource sharing is further complicated when a grid is introduced as a solution for utility computing, where commercial applications and resources
Copyright IBM Corp. 2005. All rights reserved. 115
9
become available as shareable and on demand resources. However, issues such as metering, accounting and billing, quality of services compliance, and so on, are out of the scope of this book.
This chapter aims to introduce some of the fundamental concepts in resource state management as they are currently defined in the context of a grid computing environment.
116 Introduction to Grid Computing
9.1 Resource state management using Grid services
In the grid context a resource is assumed to represent some state or data and provides some useful capability via an interface.
An interface associated with a resource defines a logical grouping of operations that can be invoked by its clients.
In the recent past we have observed increasing popularity of service oriented architecture frameworks. The emergence of service oriented architecture SOA helps grid resources to advertise their capabilities through a standard service interface.
Web services are open standardsbased mechanisms to make services available to whatever client program can take advantage of them. Web services are becoming a popular way to implement various components of a service oriented architecture and many organizations are becoming very familiar with Web services technologies and capabilities.
However, as those familiar with Web services know, Web services are typically stateless. That is, there is no memory between separate transactions invoked on the same service instance. However, for grid computing, the state of a resource or service is often important and therefore may need to persist across transactions.
Other than this little actually somewhat major difference, there are many similarities between Grid services and Web services. It would be a shame not to find a way to take advantage of the standards and facilities already provided by Web services when defining and implementing Grid services. We explore this possibility in the discussion that follows and describe what the differences are between Grid services and Web services and how these differences can be addressed.
9.1.1 What a Grid service is
A service interface associated with a grid resource is known as a Grid service. A resource and its state is controlled and managed via Grid services in a Grid environment. A Grid service may require access to more than one resource or vice versa. It is also possible that multiple Grid services access the same resource or a Grid service can create a new instance of a resource every time it is invoked.
Various grid resources may require to interact and integrate with each other depending on business requirements. It is most likely that the resources are hosted in a technologically heterogeneous environment. Therefore, a framework
Chapter 9. Web services resource framework 117
is required that abstracts environmentspecific resource implementation details from the actual interGrid service messaging. A service oriented architecture SOA provides such a framework.
It follows that an open standards compliant SOA architecture would make it easier to integrate heterogeneous resources and various layers of the grid architecture. Such an architecture would help us achieve distributed resource sharing across heterogeneous and dynamic virtual organizations, that is, grid computing.
The Global Grid Forum GGF has adopted an SOA principles based Open Grid Services Architecture OGSA that provides a framework for implementing a Grid.
All of the resources physical or logical in an OGSAcompliant grid are modeled as Grid services. These Grid services are built on top of a SOA leveraging WEB services technology. This enables a Grid service to use the capabilities of the Web services messaging model, service descriptions, and discovery. Various Web services standards have evolved to enable secure and reliable Web services transactions. The choice of Webservices technology to implement the OGSAcompliant Grid services leverages investment in Webservices architecture and its standards.
9.1.2 Grid services vs. Web services
Although Grid services are implemented using Webservices technology, there is a fundamental difference between a Grid service and a Webservice.
A Webservice addresses the issue of discovery and invocation of persistent services. A Web Services Description Language WSDL compliant document points to a location that hosts the Web service.
A Grid service addresses the issue of a virtual resource and its state management. A grid is a dynamic environment. Hence, a Grid service can be transient rather than persistent. A Grid service can be dynamically created and destroyed, unlike a Web service, which is often presumed available if its corresponding WSDL file is accessible to its client. Web services also typically out live all their clients.
This has significant implications for how Grid services are managed, named, discovered, and used. The OGSA model adopts a Factory design pattern to create transient Grid services. Thus, an OGSA Grid service is a potentially transient Web service based on grid protocols using WSDL.
118 Introduction to Grid Computing
9.1.3 OGSA Grid service requirements
From the OGSA perspective, a grid environment consists of typically few persistent and potentially many transient Grid services. All Grid services must comply with the OGSArequired interface specifications to enable reliable and secure management of a distributed state of virtual resources.
The following are some of the key capabilities that the OGSA Service Model requires a compliant Grid service to provide:
Creation: This refers to creating new instances of resources associated with a Grid service via an operation. An instance can be newly created or be initialized from a persistent state of a resource.
Global naming and references: Once we have an instance of a resource, a grid environment requires a unique networkaware reference to a resource instance with information about how to interact with the instance via the Grid service.
Lifetime management: The lifetime management operation defines the lifespan of a resource, mainly dependent on whether a resource expires after a certain time period or immediately.
Registration and discovery: This set of operations refers to the ability to find Grid service instances and their associated deploytime and runtime meta data.
Notification: The notifications are asynchronous messaging mechanisms to notify subscribing clients of certain events such as resource lifetime events, property changes, and so on.
The OGSA Grid services also address authorization, concurrency control, and manageability aspects.
There are two standards currently available to implement OGSAcompliant Grid services:
Open Grid Services Interface OGSI Grid services
Web Services Resource Framework WSRF Grid services
Both frameworks provide mechanisms to implement OGSAcompliant Grid services in different ways.
Next we review and compare both approaches at a high level and discuss WSRF in greater detail for the rest of the chapter.
Chapter 9. Web services resource framework 119
9.1.4 Open Grid Services Interface OGSI Grid services
The Open Grid Services Interface defines rules about how OGSA can be implemented using Grid services that are Web services extensions.
The OGSI specification defines a Grid service instance as a Web service that conforms to a set of conventions expressed by WSDL as service interfaces, extensions, and behaviors.
The OGSI specification defines Grid services features that include:
Statefulness
Stateful interactions
The ability to create new instances
Service lifetime management
Notification of state changes and Grid service groups
The OGSI model requires Grid services to be specified via Grid Web Services Definition Language GWSDL, which is an extension of WSDL.
The OGSI 1.0 specification defines the following interfaces that should be implemented by a Grid service.
Table 91 OGSI interfaces for a Grid service
Interface
Description
GridService
Encapsulates the root behavior of the service model. This interface is mandatory for a OGSA service based on OGSI 1.0.
HandleResolver
The OGSI method of creating an instance of a Grid service returns a handle. This handle is mapped to a reference, which then has enough information to enable client communication with the actual instance of a grid resource via a Grid service.
This interface provides the functionality to map a Grid Service Handle GSH to a Grid Service Reference GSR.
NotificationSource
Allows clients to subscribe to notification messages.
NotificationSubscription
Defines the relationship between a single NotificationSource and NotificationSink pair.
NotificationSink
Defines a single operation for delivering a notification message to the service instance that implements the operation.
120 Introduction to Grid Computing
Interface
Description
Factory
This is the standard operation for creation of Grid service instances.
ServiceGroup
This allows Grid services to be added and removed from a ServiceGroup. A ServiceGroup is a collection of Grid service instances.
ServiceGroupRegistration
This allows Grid services to be added and removed from a ServiceGroup.
ServiceGroupEntry
This defines the relationship between a Grid service instance and its membership within a ServiceGroup.
The portType construct of the WSDL grammar defines the functional interface implemented by a Web service. An OGSIcompliant Grid service component extends the GridService portType. The component may optionally extend other portTypes as listed in the previous table along with any applicationspecific portTypes, as required. The OGSI model also extends WSDL with mechanisms to specify additional state data descriptions.
The diagram below depicts the layering of various OGSI components.
Chapter 9. Web services resource framework 121
Open Grid Services Infrastructure
LifeCycle
ServiceGroup
HandleMap
Factory
State Management
Notification
NonSOAP
SOAP XML Information Set
Transport Protocols
Figure 91 OGSI components
Please refer to the Open Grid Service Infrastructure 1.0 and the Open Grid Services Infrastructure Primer documents available from the Global Grid Forum Web site http:www.ggf.org for more information about OGSI.
9.1.5 OGSI to WSRF refactoring
The Globus Toolkit 3 GT3 contains a reference implementation for the OGSI. However, its implementation through extensions to some of the Web services standards and the continuing evolution of Web services has made it more difficult for the Web services and Grid services to continue to merge than originally hoped. The Web ServicesResource Framework WRRF provides a promising solution that can address the needs of Grid services while still holding true to the Web services foundation.
The main issue of contention was the perceived divergence of the OGSI specification from the popular practices in the Web services community at large. The main objective behind the WSRF refactoring is to bring the Grid services and Web services communities closer together.
122 Introduction to Grid Computing
XML Schema WSDL GWSDL
Please note that it is possible to build and deploy OGSAcompliant, Web services based Grid services using both OGSI and WSRF proposed specifications.
The following itemizes some of the key issues observed with the OGSI approach:
Too much in one specification: The OGSI did not have a clean separation of functions to support incremental adoption. For example, Table 91 on
page 120 has a list of the full range of interfaces that can be implemented by a Grid service. The OGSI does not provide a way to partition these functions and adopt them incrementally.
The WSRF set of specifications partition the equivalent functionality in separate specifications, and they can be adopted incrementally.
Does not work well with existing Web services and XML tooling: The XML syntax used with OGSI 1.0 causes problems with JAXRPC standard APIs.
The WSRF set of specifications use standard XML Schema mechanisms that are familiar to developers and is supported by the existing tooling. The WSRF utilizes WSDL 1.1 compliant methods to associate the XML information model of a resource with a resources operations instead of Service Data Elements used by OGSI.
Too object oriented: The OGSI 1.0 models a stateful resource as a Web service that encapsulates the resources state, with the identity and life cycle of the service and resource state coupled. From a purist Web services point of view, Web services do not have any state or instances.
The WSRF set of specifications provides a distinction between the service and the management of stateful entities and their state by that service. The WSAddressing standard is used by the WSRF set of specifications to formalize the relationship between Web services and the stateful resources.
Introduction of forthcoming WSDL 2.0 functionality as unsupported extensions to WSDL 1.1: The OGSI exploited features of the WSDL 2.0 draft specification, making it difficult to support the OGSI with existing Web services tooling and runtimes.
The WSRF set of specifications relies on WSDL 1.1 constructs to avoid incompatibility issues.
The A Grid Application Framework based on Web Services Specifications and Practices paper by Parastatidis, et. al. 6, provides further discussion on issues encountered with OGSI specifications.
At a high level the OGSItoWSRF refactoring has resulted in the following:
The notion of a Grid service as a WSResource.
Chapter 9. Web services resource framework 123
A better separation of functions listed in Table 91 on page 120 by splitting the functionality in separate specifications.
WSNotification specification that can be used to build state change notifications using Web services.
The next section formally introduces the Web Services Resource Framework and Web Services Notification families of specifications.
9.2 WSRF fundamentals
In the previous discussion we described a Grid service as a service representation of a resource. A grid resource is normally assumed to represent some state. This section introduces the concept of a WSResource and associated modelling concepts that underpin the WSRF and WSNotification families of specifications.
9.2.1 What a WSResource is
The WSResource is a construct used to model stateful resources using a Web services architecture framework.
According to WSRF, a stateful resource:
Has its state data described as an XML document
Has a well defined lifecycle
Is known to and accessed by one or more Web services
A stateful resource modelled using the WSResource construct can be implemented in a variety of different ways. It can be implemented as a file on a file system or a record in a database table or may reside in memory as an applicationspecific data structure.
The diagram below depicts the relationships amongst a hypothetical movie scene rendering service and several stateful resources such as the actual scene data, special effects to be applied, and the rendering styles for television and widescreen display.
124 Introduction to Grid Computing
WSResource
WSDL
Figure 92
Example of a WSResource to WSDL relationship model
Movie Render Service facade
Wide Screen Style
TV Style
Scene
Special Effect
Stateful Resource
In the diagram above the resources are modelled as WSResources and the movierendering service is exposed as a Web service via its WSDL interface file.
The operations available from the movierendering service and the attributes of various resources are defined in the WSDL file.
It is important to note that the service and the resources are seen as a single bundle via the WSDL file by the service clients. The clients of the movierendering service in the above example never deal with a resource instance directly, but implicitly via interactions with the WSRFcompliant movie render service.
This implicit interaction with WSResource instances is known as the Implied Resource Pattern.
When a stateful resource is associated with a Web service, we refer to the component resulting from the composition of the Web service and the stateful resource as a WSResource.
It follows from the WSResource Framework discussion so far that a WSResource is an association of a Web service and at least one stateful resource. The general understanding of a Web service suggests that a Web service exposes an interface via a portType construct. The portType construct advertises one or more publicly available operations that can be invoked by a Web service client.
Chapter 9. Web services resource framework 125
Figure 93
Example WSDL fragment showing WSResource definition
In the above WSDL fragment, the GenericPlanetProperties are associated with a SolarSystem portType. This association makes the WSDL represent a WSResource.
The next section discusses the implied resource pattern.
A Web service becomes a WSResource when a portType definition within a WSDL file is associated with an XML representation of the properties of a stateful resource in a WSResource Framework specific way.
Figure 93 is an example WSDL fragment that shows the association.
definitions .name…. ………
xmlns:wsrphttp:docs.oasisopen.orgwsrf200406wsrfWSResourceProperties1.2draft01.xsd ……………..
xsd:element nameatmosphere typexsd:boolean xsd:element namewater typexsd:boolean xsd:element namename typexsd:string
xsd:element nameGenericPlanetProperties xsd:complexType
xsd:sequence
xsd:element refatmosphere minOccurs1 maxOccurs1 xsd:element refwater minOccurs1 maxOccurs1 xsd:element refname minOccurs1 maxOccurs1 xsd:any
xsd:sequence xsd:complexType
xsd:element
wsdl:portType nameSolarSystem wsrp:ResourcePropertiestns:GenericPlanetProperties operation name…
wsdl:portType
……………..
definitions
9.2.2 Implied resource pattern for stateful resources
One of the stated criticisms of OGSI 1.0 was that it was too object oriented. The implied resource pattern aims to distinguish between the actual service from the management of stateful resource instances.
The Web services are stateless. Therefore, when Web services operations are involved with dynamic state, these are the following options:
The state is provided explicitly within the request message.
126 Introduction to Grid Computing
The state is maintained implicitly via sub systems with which a Web service interacts.
The implied resource pattern implements the second option from the above. The actual state management and instance management of stateful resources is delegated to an external component. This is the approach selected for WSRF.
The WSRF implementation implicitly passes the resource identifier information when message interaction occurs between a client and a WSResource. By implicit it is meant that the client does not explicitly include a resource identifier in its request. Instead, the requisite identifier is implicitly associated with a message exchange. A resource identifier can be dynamically or statically associated with a message exchange.
The implied resource pattern in WSRF parlance utilizes a set of conventions such as XML, WSDL, and WSAddressing in particular.
The WSAddressing plays an important role in implementing the implied resource pattern.
The WSAddressing standardizes the way Web service addresses are represented. Such a representation is known as an End Point Reference EPR. Besides the Web service address an EPR can also represent enough contextual information to enable client communication with a WSResource.
The EPR contains two pieces of information:
The Web service address information
The resource properties information that may include an identifier to a resource instance besides other meta data about the service.
In the WSRF, an EPR with a resource identifier is also known as a WSResource qualified end point reference.
The resource identifier points to a stateful resource used when the Web service is invoked. The Web service maps the identifier to a stateful resource based on its business requirements.
A resource identifiers creation is analogous to creating a new instance of a WSResource. A new instance of a WSResource can be created via a WSResource Factory or some other application. A WSResource Factory Web service brings new instances of WSResource into existence.
Creating a new instance of a WSResource involves the following: 1. Creating a new instance of the resource
2. Assigninganewidentifiertothenewresourceinstance
Chapter 9. Web services resource framework 127
3. Creating an association between the new resource instance and its corresponding Web service
A WSResource Factorys operation responsible for creating new instances of WSResource may return a WSResource qualified EPR or save the equivalent information elsewhere, such as a registry or a database for later retrieval.
Because the stateful resources identifier is included in a WSResource qualified EPR the client is not required to have specific knowledge of the location of the Web service nor the resource identifier.
The actual semantic meaning of a resource identifier is Web service implementationspecific. At the current time there are no specifications that provision a resource identifier definition.
When a client application interacts with a WSResource compliant Web service, the XML representation of the concerned EPR is implicitly sent along with the request opaque to the client. If the EPR resource properties contain a resource identifier, then it gets sent along with the rest of the request in a Web service message.
From a client application perspective an EPR represents a pointer to a WSResource. The EPR may contain a resource identifier to target a clients interaction with a specific instance of a WSResource via a Web service. The resource identifier is required to be unique enough to enable a Web service to uniquely identify a stateful resource instance. The resource identifier is not required to be unique outside the scope of the Web service concerned.
The diagram below depicts how a WSResourcequalified EPR gets involved when a client interacts with a WSResource.
128 Introduction to Grid Computing
wsa:EndpointReference xmlns:wsahttp:www.w3.org200502addressing xmlns:ibmgridhttp:www.ibm.redbook.comgridintro
wsa:Addresshttp:www.ibm.redbook.comIntroGridwsa:Address wsa:ReferenceProperties
ibmgrid:MyResourceIdBibmgrid:MyResourceId wsa:ReferenceProperties
wsa:EndpointReference
WSResource
Web Service Implementation
Resource A
Resource C
Resource B
Figure 94
Using a WSResource qualified endpoint reference
Client
EPR
SOAPENV:Envelope xmlns:SOAPENVhttp:schemas.xmlsoap.orgsoapenvelope xmlns:wsahttp:www.w3.org200502addressing xmlns:ibmgridhttp:www.ibm.redbook.comgridintro
SOAPENV:Header
wsa:To SOAPENV:mustUnderstand1http:example.comsatellitewsa:To wsa:Actionhttp:www.ibm.redbook.comIntroGridwsa:Action ibmgrid:MyResourceIdBibmgrid:MyResourceId
SOAPENV:Header
SOAPENV:Body
DoSomethingRequest xmlnshttp:www.ibm.redbook.comgridintro.xsd someparameterClientValuesomeparameter
DoSomethingRequest
SOAPENV:Body SOAPENV:Envelope
The WSAddressing specification mandates that the ReferenceProperties part of an EPR must be sent as part of any message that is directed towards a Web service identified by an EPR. How the information is actually sent is dependent on protocolbinding specifics.
In the above example, the client holds an EPR that points to a fictional Web service at location http:www.ibm.redbook.comIntroGrid and identifies a stateful resource B. Because this EPR has a resource identifier in its ReferenceProperties stanza, recall that this becomes a WSResourcequalified endpoint reference.
Chapter 9. Web services resource framework 129
WSDL
When the client invokes the DoSomethingRequest operation on the Web service portType, the information contained within the ReferenceProperties stanza of the EPR XML document is sent as part of the SOAP header.
The Web service extracts the resource identifier value B and locates the corresponding resource to work with and completes the DoSomethigRequest.
Inspecting the SOAP messages body element reveals that the client request for the DoSomethingRequest only passes the operationspecific parameter that is, SomeParameter. The resource identifier is not passed explicitly by the client in the request made. This is the key to the implied resource pattern.
In the next section we review the WSRF and WSN set of specifications and briefly discuss how they meet OGSA Grid service requirements.
9.3 WSResource Framework specifications
The OGSI to WSRF transition is a refactoring exercise for various reasons briefly discussed in 9.1.5, OGSI to WSRF refactoring on page 122, which implies that collectively these specifications retain the same functionality present in OGSI.
The OGSI refactoring results in five WSRF specifications and three WSNotification family specifications. The WSNotification family of specifications addresses event notification subscription and delivery.
Each of the specifications targets a grouping of functionality. This facilitates the flexible composition of various functionality in an incremental or mixandmatch fashion.
This section gives an overview of the WSRF family of specifications.
The WSResource Framework paper by Czajkowski, et. al. 5, summarizes the various WSResource Framework specifications, as shown in Table 92.
Table 92 WSResource Framework specifications summary
Specification name
Description
WSResourceProperties
Describes associating useful resources and Web services to produce WSResources and how elements of publicly visible properties of a WSResource are retrieved, changed, and deleted
WSResourceLifeTime
Allows a requestor to destroy a WSResource either immediately or at a scheduled future point in time
130 Introduction to Grid Computing
Specification name
Description
WSRenewableReferences
Annotates a WSAddressing endpoint reference with policy information needed to retrieve a new endpoint reference when the current reference becomes invalid
WSServiceGroup
Creates and uses heterogeneous byreference collections of Web services
WSBaseFault
Describes a base fault type used for reporting errors
The following table summarizes the WSNotification family of specifications.
Table 93 WSNotification Specifications summary
Specification name
Description
WSBaseNotification
Defines Web service operations to define the roles of notification producers and notification consumers.
WSBrokeredNotification
Defines Web service operations for a notification broker. A notification broker is an intermediary which, among other things, allows publication of messages from entities that are not themselves service providers.
It includes standard message exchanges to be implemented by notification broker service providers along with operational requirements expected of service providers and requestors that participate in brokered notifications.
WSTopics
Defines a mechanism to organize and categorize topics. It defines three topic expression dialects that can be used as subscription expressions in subscribe request messages and other parts of the WSNotification system.
It further specifics an XML model for describing meta data associated with topics.
Figure 95 on page 132 provides an overview of WSResource Framework and how it relates to other Web service specifications.
Chapter 9. Web services resource framework 131
WSBaseFaults
WSServiceGroup
WSRenewable References
WSResource Properties
WSResource Lifetime
WSNotification
WSSecurity
WSAddressing SOAP
XML Information Set
Transport Protocols
Figure 95 WSResource Framework with Web service specifications
The diagram above is comparable with Figure 91 on page 122.
The From Open Grid Services Infrastructure to WSResource Framework: Refactoring and Evolution paper by Czajkowski, et. al. 3, maps primary OGSI constructs to WSResource Framework and WSNotification constructs, as shown in Table 94.
Table 94 OGSI to WSResource Framework and WSNotification map
OGSI
WSResource Framework
Grid Service Reference
WSAddressing Endpoint Reference.
Grid Service Handle
WSAddressing Endpoint Reference and WSRenewableReferences.
HandleResolver portType
WSRenewableReferences.
Service Data Definition
Resource properties definition.
GridService porType service data access
WSResourceproperties.
GridService portType lifetime management
WSResourceLifetime.
132 Introduction to Grid Computing
WSMetadataExchange
WSDL XML Schema
OGSI
WSResource Framework
Notification portTypes
WSNotification.
Factory portType
Now treated as a WSResource Factory concept. Please refer to 9.2.2, Implied resource pattern for stateful resources on page 126.
ServiceGroup portTypes
WSServiceGroup.
Base fault type
WSBaseFault.
GWSDL
Copyandpaste. Uses existing WSDL 1.1 interface composition approaches that is, copy and paste rather than using WSDL 2.0 constructs.
The following are a few observations based on the OGSItoWSRF comparison table above:
The implied resource pattern and the concept of WSResource replaces the GridService interface as defined by the OGSI 1.0 specification.
The Grid Service Handle GSH and Grid Service Reference GSR concepts are replaced by the WSAddressing standard. The EPR introduced by WSAddressing is equivalent to GSH and GSR.
The WSRF introduces a standard notification framework for Web services enabling Grid services and Web services to share notification patterns defined by WSNotification specifications.
Please refer to The From Open Grid Services Infrastructure to WSResource Framework: Refactoring and Evolution paper by Czajkowski, et. al. 4, for a detailed discussion about each of the items in the table above.
Figure 96 on page 134 gives a highlevel view of a SolarSystem WSResource that implements WSResourceProperties interface functions.
Chapter 9. Web services resource framework 133
SetResource Properties portType
GetMultiple Resource Properties portType
GetResource Property portType
QueryResource Properties portType
WSResource
Planet Resource
Planet Earth Instance Properties
WSNotification
Figure 96
A SolarSystem WSResource with WSResourceProperties interfaces
Please note in the diagram above that the SolarSystem Web service is delegating the actual planet instance management to a separate Planet resource component.
The WSResource client invokes the WSResourceProperties interface functions via the information provided in the WSDL file.
The SolarSystem WSResource generates a notification when a Planet resource instances property changes. The notification message format is also declared in the WSDL file.
Figure 97 on page 135 is an example GetResourceProperty request and response with our SolarSystem example.
134 Introduction to Grid Computing
Solar System Web service Implementation
WSResource Client
WSDL
wsrp:GetResourcePropertyRequest xmlns:tnshttp:test.orgcomputersystem tns:name !name of the property to retrieve
wsrp:GetResourcePropertyRequest
wsrp:GetResourcePropertyResponse xmlns:tnshttp:test.orgcomputersystem tns:name !an XML view of the resource property
Earth tns:name
wsrp:GetResourcePropertyResponse
Figure 97 Example WSResourceProperties request and response for GetResourceProperty operation
The WSResource Framework Interop Workshop 1 Scenarios v0.13, available from the following Web site, has numerous example messages for the rest of the WSResourceProperties and WSNotification operations.
http:www.ibm.comdeveloperworksoffersWSSpecworkshopswsrf200404.html
The Understanding WSRF series of tutorials on IBM developerWorks also provides numerous WSResource examples and further discussion about the fundamentals of the WSResource Framework.
The next section discusses the role of the WSResource Framework within the Globus Toolkit 4 GT4.
9.3.1 WSResource Framework and Globus Toolkit 4
The WSResource Framework introduces the notion of WSResource. We have seen earlier in Table 94 on page 132 and the subsequent discussion that the notion of WSResource replaces the Grid service as it was defined with OGSI 1.0.
When a WSResource is packaged as a Grid Archive GAR and deployed in a GT4 container, it is recognized by the GT4 container as a valid GT4 WSRF compliant Web service. This is synonymous with a Grid service.
The above also implies that it is also possible to implement a nonOGSA, SOA environment using WSResource Framework compliant Web services.
Figure 98 on page 136 from the A Globus Primer by Ian Foster 7 depicts various Web service deployment scenarios within a GT4 container.
Chapter 9. Web services resource framework 135
User Applications
GT4 Container
WSDL, SOAP, WSSecurity
Custom Web Services
Custom WSRF Web Services
WSAddressing, WSRF, WS Notification
GT4 WSRF Web Services
Figure 98 GT4 container services
The shaded areas of the above diagram represent a GT4 containers infrastructure components that allow it to host different services.
There are several components that are implemented as WSRF Web services within the GT4 container. A discussion of various GT4 components can be found in Chapter 10, Globus Toolkit 4 components on page 141.
At a high level the steps to implement a WSRFcompliant Web service for deployment in a GT4 container are as follows 7:
1. Definetheserviceinterface.ThisreferstopreparingaWSDLfilethatdefines our WSRF service operations and may include resource properties definitions.
2. Implement the service. This refers to developing Java code for the WSRF service operations and associated properties, if any.
3. Define deployment parameters. This refers to preparing a Web Services Deployment Descriptor WSDD file for our service that defines various aspects of service configuration.
4. Compile and generate a GAR file. The compilation and GAR file creation involves creating appropriate stub files for handling SOAP messaging and packing the service in a format required by a GT4 container.
5. Deploy the service.
136 Introduction to Grid Computing
Registry Administration
9.4 WSRF references
There are numerous tutorials available in the public domain to assist with WSRF services development and implementation. Some of the useful references are as follows:
The Globus Toolkit 4 Programmers Tutorial by Borja Sotomayor http:gdp.globus.orggt4tutorial
Understanding WSRF Parts 1 to 4 by Babu Sundaram http:www.ibm.comdeveloperworks
Using Eclipse to develop Grid services http:www.ibm.comdeveloperworksedugrdwgreclipseidei.html
Apache WSRF tutorial http:ws.apache.orgwsfxwsrftutorial
WSRF.NET Developer Tutorial by Mark Morgan and Glenn Wasson http:www.cs.virginia.edugsw2cWSRFdotNetWSRF.NETDeveloperTutorial.pd
f
9.5 Summary
This chapter provided an overview of the WSResource Framework and how it enables the handling of state information within a Web services context. We also provided information about how it relates to the Open Grid Service Interface standard.
Chapter 9. Web services resource framework 137
138 Introduction to Grid Computing
Creating a grid environment with the Globus Toolkit 4
Part 3
Part 3
Copyright IBM Corp. 2005. All rights reserved. 139
140 Introduction to Grid Computing
Chapter 10.
Globus Toolkit 4 components
The Globus Alliance is made up of organizations and individuals that develop and make available various technologies applicable to grid computing.
The Globus Toolkit, the primary delivery vehicle for technologies developed by the Globus Alliance, is an open source software toolkit used for building grid systems and applications. Many companies and organizations are using the Globus Toolkit as the basis for grid implementations of various types.
To learn more about the Globus Alliance, visit their Web site at:
http:www.globus.org
During the writing of this book, the Globus Toolkit is currently at Version 4. As we will see, the toolkit as is implied by its name consists of many components that can be used as the basis to implement a grid computing environment. It is not a complete grid solution, but provides the tools and facilities to address many of the requirements of grid computing. This chapter briefly describes the major components of Globus Toolkit 4.
Copyright IBM Corp. 2005. All rights reserved. 141
10
10.1 Overview of Globus Toolkit 4
Table 101
Globus Toolkit 4 is a collection of opensource components. Many of these are based on existsing standards, while others are based on and in some cases driving evolving standards. Version 4 of the toolkit is the first version to support Web service based implementations of many of its components. Version 3 had included an OGSI implementation of some components, and Version 2 was not service based at all.
Though many components have Web service based implementations, some do not, and for compatibility and migration reasons, some have both implementations.
Globus Toolkit 4 provides components in the following five categories:
Common runtime components
Security
Data management
Information services
Execution management
Table 101 shows a list of components in Globus Toolkit 4, and identifies those that are Web service based and those that are not. In the sections that follow, we describe each of these in more detail.
List of components in Globus Toolkit 4
Web service based components
Non Web service based components
Common runtime components
Java WS Core
C WS Core
Python WS Core
C Common Libraries
eXtensible IO XIO
Security components
WS authentica tion and authorizati on
Community Authorizatio n Service CAS
Delegation service
PreWS authenticati on and authorizatio n
Credential Manageme nt
Data management components
Reliable File Transfer RFT
OGSADAI
Data Replication Service DRS
GridFTP
Replica Location Service RLS
Monitoring and Discovery Services
Index service
Trigger service
Aggregator Framework
WebMDS
MDS2
142 Introduction to Grid Computing
Web service based components
Non Web service based components
Execution management
WS GRAM
Community Scheduler Framework 4 CSF4
Globus Teleoperatio ns Control Protocol GTCP
Workspace Manageme nt Service WMS
Pre WS GRAM
10.2 Common runtime components
Globus Toolkit 4 includes common runtime components. Common runtime components consist of libraries and tools needed by both types of implementations and used by most of the other components.
10.2.1 Java WS Core
Java WS Core consists of APIs and tools that implement WSRF and WSNotification standards implemented in Java. These components act as the base components for various default services that Globus Toolkit 4 supplies. Also, Java WS Core provides the development base libraries and tools for custom WSRF based services. Figure 101 on page 144 shows the relation between Java WS Core and other services.
Chapter 10. Globus Toolkit 4 components 143
Dele GRAM RFT MDS CAS gation
Service Service Service Service Service
Other Custom WSRF Services
Standard Web Services
Java WS Core Components
Figure 101
Web Application Server with SOAP engine
simple java container by Globus, Jakarta Tomcat
Relation between Java WS Core and Globus Toolkit 4 supplied services
For more information about WSRF, refer to Chapter 9, Web services resource framework on page 115. Also, the following link should be of interest:
http:www.globus.orgtoolkitdocs4.0commonjavawscore
10.2.2 C WS Core
C WS Core consists of APIs and tools that implement WSRF and WSNotification standards using C. For more information about C WS Core, look at the following link:
http:www.globus.orgtoolkitdocs4.0commoncwscore
10.2.3 Python WS Core
Python WS Core consists of APIs and tools that implement WSRF and WSNotification standards with Python. This component is also known as
144 Introduction to Grid Computing
pyGridWare, contributed by Lawrence Berkeley National Laboratory. For more information about Python WS Core, look at the following links:
http:dsd.lbl.govgtgprojectspyGridWare
http:www.globus.orgtoolkitdocs4.0contributionspythonwscore
10.3 Security components
Because security is one of the most important issues in grid environments, Globus Toolkit 4 includes various types of security components.
10.3.1 WS authentication and authorization
Globus Toolkit 4 enables messagelevel security and transportlevel security for SOAP communication of Web services. Also, it provides an Authorization Framework for containerlevel authorization. For more information, refer to Chapter 7, Security on page 63. Also look at the following link for more information about those components:
http:www.globus.orgtoolkitdocs4.0securitymessage
10.3.2 PreWS authentication and authorization
PreWS authentication and authorization consists of APIs and tools for authentication, authorization, and certificate management. For more information, refer to Chapter 7, Security on page 63. Also look at the following link for more information about those components:
http:www.globus.orgtoolkitdocs4.0securityprewsaa
10.3.3 Community Authorization Service CAS
CAS provides access control to virtual organizations. The CAS server grants finegrained permissions on subsets of resources to members of the community. CAS authorization is currently not available for Web services, but it supports the GridFTP server. For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0securitycas
10.3.4 Delegation service
The Delegation service enables delegation of credentials between various services in one host. The Delegation service allows a single delegated credential to be used by many services. Also, this service has a credential renewal
Chapter 10. Globus Toolkit 4 components 145
interface, and this service is capable of extending the valid date of credentials. For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0securitydelegation
Also, Figure 105 on page 153 provides an example of how this service is used by other services.
10.3.5 SimpleCA
SimpleCA is a simplified Certificate Authority. This package has fully functioning CA features for a PKI environment. In Chapter 11, Globus Toolkit 4 installation and configuration on page 155, we use SimpleCA as a Certificate Authority for our grid environment. For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0securitysimpleca
10.3.6 MyProxy
MyProxy is responsible for storing X.509 proxy credentials, protecting them by pass phrase, and enabling an interface for retrieving the proxy credential. MyProxy acts as a repository of credentials, and is often used by Web portal applications. For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0securitymyproxy
10.3.7 GSIOpenSSH
GSIOpenSSH is a modified version of the OpenSSH client and server that adds support for GSI authentication. GSIOpenSSH can be used to remotely create a shell on a remote system to run shell scripts or to interactively issue shell commands, and it also permits the transfer of files between systems without being prompted for a password and a user ID. Nevertheless, a valid proxy must be created by using the gridproxyinit command. Fore more information about GSIOpenSSH, look at the following link:
http:www.globus.orgtoolkitdocs4.0securityopenssh
Important: It is important to note that the simple CA is only recommended for testing or demo purposes. For any type of production grid, it is recommended that you evaluate commercial PKI solutions that may better suit your needs and remove the responsibility for managing your own CA.
146 Introduction to Grid Computing
10.4 Data management components
Globus Toolkit 4 provides various tools that enable data management in a grid environment.
10.4.1 GridFTP
The GridFTP facility provides secure and reliable data transfer between grid hosts. Its protocol extends the wellknown FTP standard to provide additional features, including support for authentication through GSI. One of the major features of GridFTP is that it enables thirdparty transfer. Thirdparty transfer is suitable for an environment where there is a large file in remote storage and the client wants to copy it to another remote server, as illustrated in Figure 102.
GridFTP Host GridFTP Host transfer
control control
GridFTP Client
File
GridFTP server daemon
File
GridFTP server daemon
globusurlcopy
Figure 102 GridFTP thirdparty transfer
For more information about GridFTP, look at the following link:
http:www.globus.orgtoolkitdocs4.0datagridftp
Chapter 10. Globus Toolkit 4 components 147
10.4.2 Reliable File Transfer RFT
Reliable File Transfer provides a Web service interface for transfer and deletion of files. RFT receives requests via SOAP messages over HTTP and utilizes GridFTP. RFT also uses a database to store the list of file transfers and their states, and is capable of recovering a transfer request that was interrupted. Figure 103 shows how RFT and GridFTP work.
Figure 103 How RFT and GridFTP works
For more information about RFT, look at the following link:
http:www.globus.orgtoolkitdocs4.0datarft
10.4.3 Replica Location Service RLS
The Replica Location Service maintains and provides access to information about the physical locations of replicated data. This component can map multiple physical replicas to one single logical file, and enables data redundancy in a grid environment. For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0datarls
Client
RFT Service
Database
SOAP message Notifications
File Transfer Information
GridFTP third party transfer request
GridFTP third party transfer
GridFTP Server A
GridFTP Server B
148 Introduction to Grid Computing
10.4.4 OGSADAI
OGSADAI enables a general grid interface for accessing grid data sources such as relational database management systems and XML repositories, through query languages like SQL, XPat, and XQuery. Currently, OGSADAI is a technical preview component. That is, the implementation is functional, but not necessarily complete, and its implementation and interfaces may change in the future. For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0techpreviewogsadai
10.4.5 Data Replication Service DRS
Data Replication Service provides a system for making replicas of files in the grid environment, and registering them to RLS. DRS uses RFT and GridFTP to transfer the files, and it uses RLS to locate and register the replicas. Currently, DRS is a technical preview component. For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0techpreviewdatarep
10.5 Monitoring and Discovery Services
The Monitoring and Discovery Services MDS are mainly concerned with the collection, distribution, indexing, archival, and otherwise processing information about the state of various resources, services, and system configurations. The information collected is used to either discover new services or resources, or to enable monitoring of system status.
The GT4 provides a WSRF and WSNotification compliant version of MDS, also known as MDS4.
The resource properties provided by a WSRF compliant resource can be registered with MDS4 services for information collection purposes. The GT4 WSRF compliant services such as GRAM and RFT provide such properties. Upon GT4 container startup these services are registered with MDS4 services.
MDS4 consists of two higherlevel services, an Index service and a Trigger service, which are based on the Aggregator Framework that is briefly described next.
10.5.1 Index service
The Index service is the central component of the GT4 MDS implementation. Every instance of a GT4 container has a default indexing service
Chapter 10. Globus Toolkit 4 components 149
DefaultIndexService exposed as a WSRF service. The Index service interacts with data sources via standard WSRF resource property and subscriptionnotification interfaces WSResourceProperties and WSBaseNotification. A WSRFbased service can make information available as resource properties. An Index service can potentially collect information from many sources and publish it in only one place. Various WSRF registrations with the Index service are maintained as Service Group Entries by the Index service. The contents of the Index service can be queried via XPath queries.
As noted earlier, each GT4 container has a default index service instance registered with it. Therefore, a grid computing site with multiple nodes can potentially have multiple instances of index services available for use. Often virtual organizations configure an instance of Index service to keep track of all relevant resources, containers, and services within their domain.
The following are some of the key features of an Index service:
Index services can be configured in hierarchies, but there is no single global index that provides information about every resource on the Grid.
The presence of a resource in an Index service makes no guarantee about the availability of the resource for users of that Index.
Information published with MDS is recent but not the absolute latest.
Each registration into an Index service has a lifetime and requires periodic renewal of registrations to indicate the continued existence of a resource or a service.
For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0infoindex
10.5.2 Trigger service
The MDS Trigger service collects information and compares that data against a set of conditions defined in a configuration file. When a condition is met an action is executed. The condition is specified as an XPath expression; that, for example, may compare the value of a property to a threshold and send an alert email to an administrator by executing a script. The name and location of the script can be configured with the MDS Trigger service. For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0infotrigger
150 Introduction to Grid Computing
10.5.3 Aggregator Framework
The MDSIndex service and the MDSTrigger service are specializations of a general Aggregator Framework. The Aggregator Framework is a software framework for building software services that collect and aggregate data. These services are also known as aggregator services.
An aggregator service collects information from one of the three types of aggregator sources such as a query source that utilizes WSResourceProperty mechanisms to collect data, a subscription source that uses a WSNotification subscriptionnotification mechanism to collect data, or an execution source that executes an administratorprovided application to collect information in XML format.
WSRF
Resource Property Requests
WSRF
Subscription Notification
Aggregator Framework
Program Execution
Anything
Anything Program Execution
Query Aggregator Source
Subscription Aggregator Source
Execution Source
Index Service
Trigger Service
Archive Service in development
Resource Property Requests, Subscription Notification
Clients
Resource Property Requests, Subscription Notification, Archive Service Requests
Clients
Figure 104 MDS4 Aggregator Framework
An aggregator source retrieves information from an external component called an information provider. In the case of a query and subscription source, the information provider is a WSRFcompliant service. For an execution source, the
Chapter 10. Globus Toolkit 4 components 151
information provider is an executable program that obtains data via some applicationspecific mechanism.
For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0infoaggregator
10.5.4 WebMDS
WebMDS is a Webbased interface to WSRF resource property information that can be used as a userfriendly frontend to the Index service. WebMDS uses standard resource property requests to query resource property data and transforms data for a userfriendly display. Web site administrators can customize their own WebMDS deployments by using HTML form options and creating their own XSLT transformations. For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0infoWebmds
10.6 Execution management
Globus Toolkit 4 provides various tools that enable execution management in a grid environment.
10.6.1 WS GRAM
WS GRAM is the Grid service that provides the remote execution and status management of jobs. When a job is submitted by a client, the request is sent to the remote host as a SOAP message, and handled by WS GRAM service located in the remote host. The WS GRAM service is capable of submitting those requests to local job schedulers such as Platform LSF or Altair PBS. The WS GRAM service returns status information of the job using WSNotification.
The WS GRAM service can collaborate with the RFT service for staging files required by jobs. In order to enable staging with RFT, valid credentials should be delegated to the RFT service by the Delegation service. Figure 105 on page 153 shows how job staging works.
152 Introduction to Grid Computing
Host A
Host B
sudo
RFT request
WS GRAM Client
SOAP message Notifications
WS GRAM Service
SOAP message
Delegated Credential
Delegated Credential
Delegation Service
GRAM Adapter
Fork
LSF PBS
RFT Service
GridFTP control
Transfer Information
Local Resource Manager
GridFTP Server
GridFTP Server
GridFTP transfer
GridFTP Server doesnt need to be in WS GRAM client host
Database
Figure 105 Execution of staging job
For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0executionwsgram
10.6.2 Community Scheduler Framework 4 CSF4
The Community Scheduler Framework 4 CSF4 provides an intelligent, policybased metascheduling facility for building grids where there are multiple types of job schedulers involved. It enables a single interface for different resource managers, such as Platform LSF and Altair PBS. Currently, CSF4 is a technical preview component. For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0contributionscsf
Chapter 10. Globus Toolkit 4 components 153
10.6.3 Globus Teleoperations Control Protocol GTCP
Globus Teleoperations Control Protocol is the WSRF version of NEESgrid Teleoperations Control Protocol NTCP. Currently, GTCP is a technical preview component. For more information, look at the following links:
http:www.globus.orgtoolkitdocs4.0techpreviewgtcp
http:it.nees.org
10.6.4 Workspace Management Service WMS
The Workspace Management Service enables a grid client to dynamically create, manage, and delete user accounts in a remote site. Currently, WMS is a technical preview component, and only supports management of UNIX accounts. For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0techpreviewwms
10.7 Summary
This chapter provided a brief overview of some of the components of the Globus Toolkit Version 4. Please refer to the Globus Web site for more information about the toolkit and the details of its components.
In the next chapter, we describe how to install and configure a simple Globus environment suitable for testing various components or creating a demonstration of some of the grid technologies and capabilities.
154 Introduction to Grid Computing
Chapter 11.
Globus Toolkit 4 installation and configuration
This chapter presents the necessary steps to install and configure Globus Toolkit 4 in a simple environment. You can follow these steps to set up a demo environment suitable for your own testing and to gain experience with some of the components of the Globus Toolkit.
The following topics are discussed:
How to obtain Globus Toolkit 4
Packages of Globus Toolkit 4
Grid environment
Installation
Configuration and testing of grid environment
Uninstallation
Copyright IBM Corp. 2005. All rights reserved. 155
11
11.1 How to obtain Globus Toolkit 4
Globus Toolkit 4 is supported on a variety of operating systems. Binary packages are available for Linux environments SuSE Linux 98, Red Hat Linux 9, Fedora Core Linux 23, and Debian 3.1, and Solaris 9. By compiling the source packages, Globus Toolkit 4 can be used on other operating systems such as AIX and Mac OS X. The Javabased components including the WSRFcompliant WS Java container run in most Javasupported operating systems including Windows.
For the purpose of this book, we use both the binary and source packages of Globus Toolkit 4.0 running on Red Hat Linux 9.
This version of Globus Toolkit may be obtained at the official Globus Project site:
http:www.globus.orgtoolkitdownloads4.0.0
Note: Though our test environment was built using Globus Toolkit 4.0, by the time this book is published, 4.0.1 or later may be available. If using a later release than 4.0.0, some of the information documented here may be slightly out of date.
Important: Globus Toolkit is distributed under the Globus Toolkit Public License GTPL Version 3, a liberal open source license. You are allowed to use every tool and the source code in the Globus Toolkit as you like with no restriction. But it is as is with no warranty. You can find out more about the GTPL at:
http:www.globus.orgtoolkitlegal4.0licensev3.html
For bug tracking, the Globus Project provides the following Web site:
http:bugzilla.globus.orgbugzilla
For platformspecific system requirements for Globus Toolkit 4, please refer to the following Web site:
http:www.globus.orgtoolkitdocs4.0admindocbookch03.htmlsplatform
11.2 Packages of Globus Toolkit 4
Globus Toolkit 4 is available in three ways.
Download the full binary package from the Globus site.
Download the full source package from the Globus site.
156 Introduction to Grid Computing
Get source codes from a CVS server.
Depending on your environment, you can choose from these ways.
11.2.1 Binary packages
Table 111 shows the list of Globus Toolkit 4 binary packages that are available. If you install Globus Toolkit 4 into one of the operating systems described in
Table 111, you can use binary packages. Otherwise, if you are using operating systems not listed in Table 111, you need to obtain the source packages and build binaries.
Table 111 List of Globus Toolkit 4 binary packages
Binary packages name
Operating system
Version
Platform
gt4.0.0ia32redhat9binaryinstaller.tar.gz
Red Hat Linux
9
ia32
gt4.0.0ia32fedora2binaryinstaller.tar.gz
Fedora Core Linux
2
ia32
gt4.0.0ia32fedora3binaryinstaller.tar.gz
Fedora Core Linux
3
ia32
gt4.0.0ia32debianbinaryinstaller.tar.gz
Debian Linux
3.1
ia32
gt4.0.0sun4usolaris9binaryinstaller.tar.gz
Solaris
9
sun4u
gt4.0.0x8664sles9binaryinstaller.tar.gz
SuSE Linux
9
x8664 Opteron
gt4.0.0ia64sles8binaryinstaller.tar.gz
SuSE Linux
8
ia64 Itanium
Java WS Core components are also available. This package only includes WSRFcompliant WS Java container and base components. All packages in Table 111 include Java WS core components, so you do not need to install both packages. Table 112 shows a list of Java WS Core installation packages.
Table 112 List of Java WS Core packages
Binary packages
Operating system
Version
Platform
wscore4.0.0bin.zip wscore4.0.0bin.tar.gz
All java vmenabled operating systems
Chapter 11. Globus Toolkit 4 installation and configuration 157
You can obtain those packages from the following page:
http:www.globus.orgtoolkitdownloads4.0.0
11.2.2 Source packages
Table 113 shows the list of Globus Toolkit 4 source packages that are available.
Table 113 List of Globus Toolkit 4 source packages
You can obtain those packages from the following page:
http:www.globus.orgtoolkitdownloads4.0.0
You can also obtain individual packages from a CVS repository. Table 114 shows the list of major packages available in CVS.
Table 114 Major packages available in CVS repository
Source packages name
Description
gt4.0.0allsourceinstaller.tar.bz2 gt4.0.0allsourceinstaller.tar.gz
Source packages with all components
wscore4.0.0src.zip wscore4.0.0src.tar.gz
Source packages with only Java WS core components
Source package name
Description
wsrf
WS Core packages
wstransfer
RFT packages
wsmds
WS MDS packages
wsgram
WS GRAM packages
In order to obtain the packages from CVS, type the following command:
cvs d :pserver:anonymouscvs.globus.org:homeglobdevCVSglobuspackages checkout packagename
11.3 Grid environment
Figure 111 on page 159 introduces a conceptual grid environment after a Globus Toolkit installation. In this chapter we take you through the steps required to install and configure this environment. There are three servers:
CA
158 Introduction to Grid Computing
This is a Certificate Authority host. We use SimpleCA, which is included in the Globus Toolkit 4 package, as a Certificate Authority.
Host A, host B
These are the grid nodes. We install Globus Toolkit 4 packages to those
hosts.
The users names are different on host A auser1 and host B buser1, but they share the same grid user ID, which is known as the Distinguished Name:
OGridOGlobusOUredbook.ibm.comCNgrid user 1
Note: In a grid environment, users use X.509 certificates to distinguish themselves from other users. So each grid user has one X.509 certificate, and the subject of the X.509 certificate is defined as the Distinguished Name.
Grid User name Distinguished Name of user certificate
OGridOGlobusOUredbook.ibm.comCNgrid user 1
Hostname : ca.redbook.ibm.com Role : Certificate Authority
OS : Red Hat 9
Package : Globus Toolkit 4 Binary
Hostname : hosta.redbook.ibm.com Role : Grid Node
OS : Red Hat 9
Package : Globus Toolkit 4 Binary Local User: auser1 uid:511 gid:511
CA Host A Host B
TCPIP network
Hostname : hostb.redbook.ibm.com Role : Grid Node
OS : Red Hat 9
Package : Globus Toolkit 4 Source Local User: buser1 uid:521 gid:521
Figure 111 System overview after installation
We install software with the versions shown in Table 115 on page 160, and this book uses these versions during the installation process. Also, the installation directory of each software component is listed in Table 115 on page 160. If you want to install software with different versions or directories, make sure you specify your own version and directory each time you submit a command.
Chapter 11. Globus Toolkit 4 installation and configuration 159
Table 115 Version and directory of each Globus software component
Name of software
Version
Directory
Globus Toolkit 4
4.0.0
usrlocalglobus4.0.0
IBM Java SDK
1.4.2
optIBMJava2142
Apache Ant
1.6.3
usrlocalapacheant1.6.3
At first, we explain how to install the servers using both binary and source packages. Then we show you how to configure CA, host A, and host B.
11.4 Installation
In order to install Globus Toolkit 4, we need to configure some tools that are essential for the Globus Toolkit 4 installation. After installation of those tools, we will install Globus Toolkit 4 using those tools.
11.4.1 Installing required software for Globus Toolkit 4 installation
Table 116 shows a list of software we need for Globus Toolkit 4 installation.
Table 116 List of required software for Globus Toolkit 4 installation
Software name
Recommended version
Java SDK IBM Sun BEA
1.4.2 or later
Apache Ant
1.5.1 or later
gcc
3.2.1 and 2.95.x are tested avoid 3.2
GNU tar
GNU sed
zlib
1.1.4 or later
GNU Make
sudo
PostgreSQL
or other JDBC compliant database
7.1 or later
if using PostgreSQL
Most of the packages in Table 116 are installed after Red Hat Linux 9 installation. Therefore, we only show how to install the IBM Java SDK and
160 Introduction to Grid Computing
Apache Ant, which are not installed during the Red Hat Linux 9 installation. Installation and configuration of PostgreSQL are described in Configuration and testing of RFT on page 180.
IBM Java SDK installation
To install IBM Java SDK:
1. Obtain IBM Java SDK from the following URL:
http:www.ibm.comdeveloperworksjavajdklinux140
2. Install IBM Java SDK. Example 111 shows the installation procedure of IBM Java SDK.
Example 111 Installation of IBM Java SDK
roothosta rpm ivh IBMJava2142ia32SDK1.4.22.0.i386.rpm
Preparing… 100
1:IBMJava2142ia32SDK 100
3. AddenvironmentalvariablesforIBMJavaSDK.Example112showsan example of the etcprofile.
Example 112 Example of etcprofile
…unrelated information omitted
export JAVAHOMEoptIBMJava2142
export PATHJAVAHOMEbin:PATH
4. Log out and log in. Instead, you may type source etcprofile to ensure the variables are set and available.
5. To test the IBM Java SDK installation, type java version. If you see the version, then IBM Java SDK is properly installed see Example 113 on page 162.
Note: You may alternatively use the Sun Java SDK. The installation procedure of Sun Java SDK is similar to IBM Java SDK. You may obtain Sun Java SDK from the following URL:
http:java.sun.comj2se1.4.2download.html
Note: We set up environmental variables in etcprofile in order to make those variables available to all users on the same host. You can put those variables into userhome.bashprofile if you do not want to share those variables between users.
Chapter 11. Globus Toolkit 4 installation and configuration 161
Example 113 Test IBM Java SDK installation
roothosta java version
java version 1.4.2
JavaTM 2 Runtime Environment, Standard Edition build 1.4.2
Classic VM build 1.4.2, J2RE 1.4.2 IBM build cxia3214220050609 JIT enabled:
jitc
Apache Ant installation
To install Apache Ant:
1. Obtain Apache Ant from the following URL:
http:ant.apache.org
2. ExtracttheApacheAntarchive.Example114showstheextractprocedureof Apache Ant.
Example 114 Extraction of Apache Ant
roothosta tar xvzf apacheant1.6.3bin.tar.gz C usrlocal apacheant1.6.3binant
apacheant1.6.3binantRun
…unrelated information omitted
3. AddenvironmentalvariablesforApacheAnt.Example115showsthe example of etcprofile.
Example 115 Example of etcprofile
…unrelated information omitted
export ANTHOMEusrlocalapacheant1.6.3
export PATHANTHOMEbin:PATH
4. Log out and log in. Instead, you may type source etcprofile to set the variables and make them available.
5. To test Apache Ant installation, type the ant command see Example 116. This command will initially fail because a build.xml file is missing, but this output means Apache Ant is working.
Example 116 Test Apache Ant installation
roothosta ant
Buildfile: build.xml does not exist!
Build failed
162 Introduction to Grid Computing
11.4.2 Preparing the OS for Globus Toolkit 4 installation
Before you install Globus Toolkit 4, there are few things that need to be prepared.
Users in each host
Add the users in Table 117 for Globus Toolkit 4 installation and configuration.
In a Linux environment, users can be added with a command such as that shown in the following example, where the password is provided with the p parameter.
Example 117 Adding a user with the adduser command
adduser buser1 p buserpw
Table 117 Users for our Globus Toolkit 4 installation and configuration
Host name
User name
ca
globus
hosta
globus
auser1
hostb
globus
buser1
Time settings
You should make sure to synchronize the system time of all the machines in your environment. GSI certificates use timestamps and are very sensitive to the time. If the system time of your grid environment is not set correctly, errors might occur when you use GSI certificates. For this reason, it is strongly recommended that you set up a time server, such as NTP, in your grid environment, and set the time correctly on all of your systems. Use of a time server is especially important in a distributed environment where a single administrator cannot easily ensure the correct setting of system clocks.
In order to configure NTP, look at following procedures:
1. On the machine that is designated to be the time server in our case, CA host, edit the etcntp.conf file as a root user. Leave the two lines shown in Example 118 as the only uncommented ones, commenting out all of the other lines with a leading character.
Example 118 etcntp.conf of ntp server
server 127.127.1.0 local clock
Chapter 11. Globus Toolkit 4 installation and configuration 163
driftfile etcntpdrift
2. Onthemachinethatisdesignatedtobethentpclientinourcase,hostAand host B, edit the etcntp.conf file as the root user. Leave the two lines shown in Example 119 as the only uncommented ones, commenting out all of the other lines with a leading character.
Example 119 etcntp.conf of ntp client
server ip address of ntp server time server driftfile etcntpdrift
3. Inallhostsastherootuser,configurethentpdaemontorunonthenextboot see Example 1110.
Example 1110 Configure ntp server to run on the next boot
roothosta chkconfig ntpd on
4. In all hosts as the root user, start the ntp service see Example 1111.
Example 1111 Starting ntp server
roothosta service ntpd start
5. Check if the time is synchronized with the ntp server by using the ntpq command. If you get an asterisk before the time server name, then your ntp service is properly configured see Example 1112.
Example 1112 Check the time setting with the ntpq command
roothosta ntpq p
remote refid st t when poll reach delay offset jitter
ca.redbook.ib LOCAL0 6 u 516 1024 377 0.931 2.258 0.262
Firewall settings
If you have a firewall in your environment, you should open the TCP ports listed in Table 118 on page 165 in order to use the servicescomponents of Globus Toolkit 4. Take a look at your firewall settings and make sure those ports are open.
Note: You may have to wait a few minutes before the ntp service synchronizes the time between systems.
164 Introduction to Grid Computing
Table 118 TCP port numbers used by Globus Toolkit 4
TCP port number
Application
2811
GridFTP
8080
Globus container nonsecure mode
8443
Globus container secure mode
11.4.3 Installing Globus Toolkit 4
You may install Globus Toolkit 4 in many ways.
In this section, we introduce both binary and source package installation. Installation of the binary package is extremely fast, while installation using the source package will take longer, as would be expected. As a guideline, we have provided the approximate time it took for us to install the packages in our environment in Table 119. Your experience may vary.
Table 119 Approximate time for installation with Intel Pentium 4 3GHz, memory
Installation from binary package
To install from a binary package:
1. Obtain the Globus Toolkit 4 binary package from the Globus site. For more information, see 11.2.1, Binary packages on page 157.
2. Extract the binary package as the Globus user see Example 1113.
Example 1113 Extracting binary package
globushosta tar xvzf gt4.0.0ia32redhat9binaryinstaller.tar.gz
C tmp
3. Set environmental variables for the Globus location. Example 1114 shows how to set up the environmental variables.
Example 1114 Set up the environmental variables for Globus
globushosta export GLOBUSLOCATIONusrlocalglobus4.0.0
4. Create and change the ownership of directory for user and group globus. See Example 1115 on page 166.
Package type
Approximate time
Binary
1 min.
Source
83 min.
Chapter 11. Globus Toolkit 4 installation and configuration 165
Example 1115 Create and change the ownership of directory
globushosta su
Password:
roothosta mkdir GLOBUSLOCATION
roothosta chown globus:globus GLOBUSLOCATION
roothosta exit
exit
globushosta
5. Configure and install Globus Toolkit 4 see Example 1116.
Example 1116 Configure and install Globus Toolkit 4
globushosta cd tmpgt4.0.0ia32redhat9binaryinstaller
globushosta .configure prefixGLOBUSLOCATION
checking for javac… usrjavaj2sdk1.4.208binjavac
checking for ant… usrlocalapacheant1.6.3binant
configure: creating .config.status
config.status: creating Makefile
globushosta make 21 tee build.log
cd gpt3.2autotools2004 OBJECTMODE32 .buildgpt buildgpt installing GPT into usrlocalglobus4.0.0 …unrelated information omitted
globushosta make install
ln s usrlocalglobus4.0.0etcgptpackages usrlocalglobus4.0.0etcglobuspackages usrlocalglobus4.0.0sbingptpostinstall …unrelated information omitted config.status: creating fork.pm
..Done
Installation from source package
To install from a source package:
1. Obtain the Globus Toolkit 4 source package from the Globus site. For more information, see 11.2.2, Source packages on page 158.
2. Extract the source package with the Globus user ID see Example 1117.
Example 1117 Extracting source package
globushostb tar xvzf gt4.0.0allsourceinstaller.tar.gz C tmp
3. Set environmental variables for the Globus location. Example 1118 on page 167 shows how to set up the environmental variables.
166 Introduction to Grid Computing
Example 1118 Set up GLOBUSLOCATION environmental variables for Globus
globushostb export GLOBUSLOCATIONusrlocalglobus4.0.0
4. Create and change the ownership of the directory for user and group Globus see Example 1119.
Example 1119 Create and change the ownership of directory
globushostb su
Password:
roothostb mkdir GLOBUSLOCATION
roothostb chown globus:globus GLOBUSLOCATION
roothostb exit
exit
5. Configure and install Globus Toolkit 4 see Example 1120.
Example 1120 Configure and install Globus Toolkit 4
globushostb cd tmpgt4.0.0allsourceinstaller
globushostb .configure prefixGLOBUSLOCATION
checking build system type… i686pclinuxgnu
checking for javac… usrjavaj2sdk1.4.208binjavac
checking for ant… usrlocalapacheant1.6.3binant
configure: creating .config.status
config.status: creating Makefile
globushostb make 21 tee build.log
cd gpt3.2autotools2004 OBJECTMODE32 .buildgpt buildgpt installing GPT into usrlocalglobus4.0.0 …unrelated information omitted
globushostb make install
usrlocalglobus4.0.0sbingptpostinstall
running usrlocalglobus4.0.0setupglobussetupglobuscommon.. Changing to usrlocalglobus4.0.0setupglobus
…unrelated information omitted
config.status: creating fork.pm
..Done
11.5 Configuration and testing of grid environment
After the installation of the Globus Toolkit, each element of your grid environment must be configured.
Chapter 11. Globus Toolkit 4 installation and configuration 167
11.5.1 Configuring environmental variables
Before starting the configuration process, it is useful to set up the GLOBUSLOCATION environmental variables in either etcprofile or userhome.bashprofile. To save time upon subsequent logins from different user IDs, we specified GLOBUSLOCATION in etcprofile see Example 1121.
Also, Globus Toolkit provides shell scripts to set up these environmental variables. They can be sourced as follows:
source GLOBUSLOCATIONetcglobususerenv.sh sh
source GLOBUSLOCATIONetcglobususerenv.csh csh
The Globus Toolkit also provides shell scripts for developers to set up Java CLASSPATH environmental variables. They can be sourced as follows:
source GLOBUSLOCATIONetcglobusdevelenv.sh sh
source GLOBUSLOCATIONetcglobusdevelenv.csh csh
In this book, to save time upon subsequent logins, we specify globususerenv.sh and globusdevelenv.sh in etcprofile so that all users can use the grid environment.
Example 1121 Example of etcprofile
…unrelated information omitted
export GLOBUSLOCATIONusrlocalglobus4.0.0
source GLOBUSLOCATIONetcglobususerenv.sh
source GLOBUSLOCATIONetcglobusdevelenv.sh
11.5.2 Security set up
In this book, we use SimpleCA, which is a wrapper of OpenSSL CA functionality.
Important: Before setting up a certificate authority CA, make sure to synchronize the system time of all the machines in your environment. For more information, refer to 11.4.2, Preparing the OS for Globus Toolkit 4 installation on page 163.
Note: Make sure Globus Toolkit 4 is also installed on the CA host, as the SimpleCA package is provided in the Globus Toolkit. See Installing Globus Toolkit 4 on page 165 for installation procedures.
168 Introduction to Grid Computing
Installation of CA packages
To install CA packages:
1. Log in to the CA host as a Globus user.
2. Invoke the setupsimpleca script, and answer the prompts as appropriate See Example 1122. This script initializes the files that are necessary for SimpleCA.
Example 1122 Setting up SimpleCA
globusca GLOBUSLOCATIONsetupglobussetupsimpleca
WARNING: GPTLOCATION not set, assuming:
GPTLOCATIONusrlocalglobus4.0.0
C e r t if i c a te A u th o r it y S et u p
This script will setup a Certificate Authority for signing Globus users
certificates. It will also generate a simple CA package that can be
distributed to the users of the CA.
The CA information about the certificates it distributes will be kept in:
homeglobus.globussimpleCA
usrlocalglobus4.0.0setupglobussetupsimpleca: line 250: test: res:
integer expression expected
The unique subject name for this CA is:
cnGlobus Simple CA, ousimpleCAca.redbook.ibm.com, ouGlobusTest, oGrid
Do you want to keep this as the CA subject yn y: y
Enter the email of the CA this is the email where certificate requests will be sent to be signed by the CA: type mail addressglobusca.redbook.ibm.com
The CA certificate has an expiration date. Keep in mind that once the CA certificate has expired, all the certificates signed by that CA become invalid. A CA should regenerate the CA certificate and start reissuing casetup packages before the actual CA certificate expires. This can be done by rerunning this setup script. Enter the number of DAYS the CA certificate should last before it expires.
default: 5 years 1825 days: type the number of days1825
Enter PEM pass phrase: type ca certificate pass phrase
Verifying Enter PEM pass phrase: type ca certificate pass phrase …unrelated information omitted
Chapter 11. Globus Toolkit 4 installation and configuration 169
setupsslutils: Complete
Setting up security in each grid node
After performing the steps above, a package file has been created that needs to be used on other nodes, as described in this section. In order to use certificates from this CA in other grid nodes, you need to copy and install the CA setup package to each grid node.
1. Log in to a grid node as a Globus user and obtain a CA setup package from the CA host. Then run the setup commands for configuration see
Example 1123.
Example 1123 Set up CA in each grid node
globushosta scp globusca:homeglobus.globussimpleCA globussimplecacahashsetup0.18.tar.gz . globushosta GLOBUSLOCATIONsbingptbuild globussimplecacahashsetup0.18.tar.gz gcc32dbg globushosta GLOBUSLOCATIONsbingptpostinstall
2. Astherootuser,submitthecommandsinExample1124toconfiguretheCA settings in each grid node. This script creates the etcgridsecurity directory. This directory contains the configuration files for security.
Example 1124 Configure CA in each grid node
roothosta GLOBUSLOCATIONsetup globussimplecacahashsetupsetupgsi default
Obtain and sign a host certificate
In order to use some of the services provided by Globus Toolkit 4, such as Grid FTP, you need to have a CA signed host certificate and host key in the appropriate directory.
1. Asrootuser,requestahostcertificatewiththecommandinExample1125 on page 171.
Note: A CA setup package is generated when you run the setupsimpleca command in Example 1122. Keep in mind that the name of the CA setup package includes a unique CA hash.
Note: For the setup of the CA host, you do not need to run the setupgsi script. This script creates a directory that contains the configuration files for security. The CA host does not need this directory, because these configuration files are for the servers and users who use the CA.
170 Introduction to Grid Computing
Example 1125 Request a host certificate
roothosta gridcertrequest host hostname
2. Copy or send the etcgridsecurityhostcertrequest.pem file to the CA host.
3. In the CA host as a Globus user, sign the host certificate by using the
gridcasign command.
Example 1126 Sign a host certificate
globusca gridcasign in hostcertrequest.pem out hostcert.pem
To sign the request
please enter the password for the CA key: type ca passphrase The new signed certificate is at:
homeglobus.globussimpleCAnewcerts01.pem
4. Copy the hostcert.pem back to the etcgridsecurity directory in the grid node.
Obtain and sign a user certificate
In order to use the grid environment, a grid user needs to have a CA signed user certificate and user key in the users directory.
1. Asauserauser1inhosta,requestausercertificatewiththecommandin Example 1127.
Example 1127 Request a user certificate
auser1hosta gridcertrequest
Enter your name, e.g., John Smith: grid user 1 type grid user name
A certificate request and private key is being created.
You will be asked to enter a PEM pass phrase.
This pass phrase is akin to your account password,and is used to protect your key file.
If you forget your pass phrase, you will need to obtain a new certificate.
Generating a 1024 bit RSA private key ……………………………….
…
writing new private key to homeauser1.globususerkey.pem
Enter PEM pass phrase: type pass phrase for grid user
Verifying Enter PEM pass phrase: retype pass phrase for grid user …unrelated information omitted
2. Copy or send the userhome.globususercertrequest.pem file to the CA host.
Chapter 11. Globus Toolkit 4 installation and configuration 171
3. In CA host as a Globus user, sign the user certificate by using the gridcasign command see Example 1128.
Example 1128 Sign a user certificate
globusca gridcasign in usercertrequest.pem out usercert.pem
To sign the request
please enter the password for the CA key:
The new signed certificate is at:
homeglobus.globussimpleCAnewcerts02.pem
4. Copy the created usercert.pem to the userhome.globus directory on the grid node.
5. Test the user certificate by typing gridproxyinit debug verify as the auser user. With this command, you can see the location of a user certificate and a key, CAs certificate directory, a distinguished name for the user, and the expiration time. After you successfully execute gridproxyinit, you have been authenticated and are ready to use the grid environment.
Example 1129 Testing user certificate installation
auser1hosta gridproxyinit debug verify
User Cert File: homeauser1.globususercert.pem
User Key File: homeauser1.globususerkey.pem
Trusted CA Cert Dir: etcgridsecuritycertificates
Output File: tmpx509upu511
Your identity:
OGridOUGlobusTestOUsimpleCAca.redbook.ibm.comOUredbook.ibm.comCNgrid
user 1
Enter GRID pass phrase for this identity:
Creating proxy ………
……………..
Done
Proxy Verify OK
Your proxy is valid until: Thu Jun 9 22:16:28 200
Note: You may copy those user certificates to other grid nodes in order to access each grid node as a single grid user. But you may not copy a host certificate and a host key. A host certificate is needed to be created in each grid node.
172 Introduction to Grid Computing
Set mapping information between a grid user and a local user
Globus Toolkit 4 requires a mapping between an authenticated grid user and a local user. In order to map a user, you need to get the distinguished name of the grid user, and map it to a local user.
1. Get the distinguished name by invoking the gridcertinfo command. Example 1130 Obtaining distinguished name
auser1hosta gridcertinfo subject f homeauser1.globususercert.pem
OGridOUGlobusTestOUsimpleCAca.redbook.ibm.comOUredbook.ibm.comCNgrid
user 1
2. Asarootuser,mapthelocalusernamewiththedistinguishednamebyusing the gridmapfileaddentry command, as seen in Example 1131.
Example 1131 Map a grid user and local user
roothosta gridmapfileaddentry dn
OGridOUGlobusTestOUsimpleCAca.redbook.ibm.comOUredbook.ibm.comCNgri
d user 1 ln auser1
Modifying etcgridsecuritygridmapfile …
etcgridsecuritygridmapfile does not exist… Attempting to create
etcgridsecuritygridmapfile
New entry:
OGridOUGlobusTestOUsimpleCAca.redbook.ibm.comOUredbook.ibm.comCNgri
d user 1 auser1
1 entry added
3. Inordertoseethemappinginformation,lookatetcgridsecuritygridmapfile see Example 1132.
Example 1132 Example of etcgridsecuritygridmapfile
OGridOUGlobusTestOUsimpleCAca.redbook.ibm.comOUredbook.ibm.comCNgri
d user 1 auser1
4. Tocheckforconsistencyofthemapfile,submit gridmapfilecheckconsistency. If you get no response from this command, then it means the gridmapfile is consistent.
Note: The gridmapfileaddentry command creates and adds an entry to etcgridsecuritygridmapfile. You can manually add an entry by adding a line into this file.
Chapter 11. Globus Toolkit 4 installation and configuration 173
Example 1133 Check the consistency of gridmapfile
roothosta gridmapfilecheckconsistency
11.5.3 Configuration of Java WS Core
The Java WS Core container is installed as a part of the default Globus Toolkit 4 installation. There are a few things you need to configure before you start Java WS Core.
Setting up Java WS Core environment
The Java WS Core container uses a copy of the host certificate and a host key. You need to copy and change the owner of those files before you start the Java WS Core container.
As a root user, copy hostcert.pem and hostkey.pem to containercert.pem and containerkey.pem in etcgridsecurity. Then change the owner of the new files to Globus see Example 1134.
Example 1134 Copying host certificate and key to container certificate and key
roothosta cp hostcert.pem containercert.pem
roothosta cp hostkey.pem containerkey.pem
roothosta chown globus.globus containercert.pem containerkey.pem
Verifying the installation and configuration of Java WS Core
To verify that the Java WS Core has been installed successfully and that grid security has been implemented correctly, complete the following procedure:
1. AsaGlobususer,runthefollowingcommandtostartthecontainer:
globusstartcontainer
If you do not use a secured container, then type following command:
globusstartcontainer nosec
2. When the process is complete, a message indicates that the container is open for Grid services, as shown in Example 1135.
Example 1135 Starting the Java WS Core container
globushosta globusstartcontainer nosec
20050609 11:31:41,192 ERROR service.ReliableFileTransferImpl main,init:73 Unable to setup data base driver with pooling.Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCPIP connections.
20050609 11:31:41,848 WARN service.ReliableFileTransferHomemain,initialize:97 All RFT requests will fail
174 Introduction to Grid Computing
and all GRAM jobs that require file staging will fail.Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCPIP connections.
Starting SOAP server at: http:192.168.1.103:8080wsrfservicesWith the following services:
1: http:192.168.1.103:8080wsrfservicesTriggerFactoryService
2: http:192.168.1.103:8080wsrfservicesDelegationTestService …unrelated information omitted
51: http:192.168.1.103:8080wsrfservicesManagedJobFactoryService 20050609 11:32:10,359 INFO impl.DefaultIndexService Thread9,processConfigFile:99 Reading default registration configuration from file: usrlocalglobus4.0.0etcglobuswsrfmdsindexhierarchy.xml 20050609 11:32:11,398 ERROR impl.QueryAggregatorSource Thread11,pollGetMultiple:149 Exception Getting Multiple Resource Properties from http:192.168.1.103:8080wsrfservicesReliableFileTransferFactoryService: java.rmi.RemoteException: Failed to serialize resource property org.globus.transfer.reliable.service.factory.TotalNumberOfBytesTransferred1fd1 0fa; nested exception is: org.apache.commons.dbcp.DbcpException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCPIP connections.
Note: globusstartcontainer may take some time to complete.
Note: With the globusstartcontainer command, you will see many exceptions regarding RFT. This is because we have not configured RFT yet, and therefore these messages are normal. If you do not want these messages, go to Configuration and testing of RFT on page 180 and configure RFT first.
Executing Counter Sample program
Globus Toolkit 4 includes sample programs. Counter Sample is one of the samples in Globus Toolkit 4. Counter Sample contains a CounterService and counter client. CounterService has two key operations:
createCounter Create a new counter resource and return the end point reference of the resource.
add Add a value to the specified counter resource.
Chapter 11. Globus Toolkit 4 installation and configuration 175
CounterService is deployed into the container during the installation process by default, so you only need to use the client program to try Counter Sample. To try the sample, follow these procedures:
1. If your Java WS Core container is not running, start your container by typing the following command:
globusstartcontainer nosec
If you do not want to run the container in secure mode, then use the nosec option.
Make sure the CounterService entry is shown when you start your container see Example 1136.
Example 1136 Part of globusstartcontainer output
…unrelated information omitted
15: https:192.168.1.103:8443wsrfservicesCounterService
…unrelated information omitted
2. Log in to your grid node with a user that has grid user certificates.
3. Type the gridproxyinit command to authenticate and create the proxy certificate see Example 1137.
Note: If you are using nonsecure container, you do not need this step.
Example 1137 Submitting gridproxyinit command
auser1hosta gridproxyinit
Your identity:
OGridOUGlobusTestOUsimpleCAca.redbook.ibm.comOUredbook.ibm.comCNgrid
user 1
Enter GRID pass phrase for this identity:
Creating proxy ……………………………….. Done
Your proxy is valid until: Tue Jun 14 21:41:25 2005
4. Create a counter resource by typing following command:
countercreate s URI of CounterService epr file name
Output of this command includes an end point reference string, so you need to redirect the output to file. See Example 1138.
Example 1138 Create counter resource
auser1hosta countercreate s
https:192.168.1.103:8443wsrfservicesCounterService test.epr
176 Introduction to Grid Computing
5. Addavaluetothecounterresourcebytypingthefollowingcommand:
counteradd e epr file name value to add
Output of this command shows the result after addition. You may try several times to see how it works see Example 1139.
Example 1139 Add values to counter resource
auser1hosta counteradd e test.epr 3
3
auser1hosta counteradd e test.epr 4
7
Troubleshooting
The following are a few common errors that may occur and what you might do to correct them.
The following message appears during the globusstartcontainer command.
Failed to start container: Failed to initialize ManagedJobFactoryService
service Caused by: SEC Service credentials not configured and was not
able to obtain container credentials.;
This may be due to not having properly created container certificates. Also, this error appears when you do not have a gridmapfile. Make sure you follow the steps in 11.5.2, Security set up on page 168.
The following message appears during the globusstartcontainer command.
Failed to start container: Container failed to initialize Caused by:
Address already in use
This is because you have another container or program running. You may need to stop the container or program in order to make this command work.
The following message appears during the countercreate command.
Error: ; nested exception is:
GSSException: Defective credential detected Caused by: Proxy file
tmpx509upu511 not found.
This is because you have tried to access a secured container without an activated proxy certificate. You need to run the gridproxyinit command in order to make this command work.
11.5.4 Configuration and testing of GridFTP
You need to configure GridFTP before RFT, because GridFTP is required by RFT. GridFTP is already installed during the default installation process. You
Chapter 11. Globus Toolkit 4 installation and configuration 177
only need to configure GridFTP as a service daemon so that you can transfer data between two hosts with GridFTP.
Setting up GridFTP environment
In order to install GridFTP, follow the procedures below.
1. AssigntheservicenamegsiftptoTCPport2811inetcservicesasyousee in Example 1140.
Example 1140 Example of etcservices file
…unrelated information omitted
gsiftp 2811tcp GridFTP
2. Create the etcxinetd.dgsiftp file with the entry in Example 1141.
Example 1141
service gsiftp
instances
sockettype
wait
user
env
env
server
serverargs
logonsuccess
nice
disable
Example of etcxinetd.dgsiftp
100
stream
no
root
GLOBUSLOCATIONusrlocalglobus4.0.0
LDLIBRARYPATHusrlocalglobus4.0.0lib
usrlocalglobus4.0.0sbinglobusgridftpserver
i
DURATION
10
no
3. Restart xinetd daemon see Example 1142.
Example 1142 Restarting xinetd daemon
roothosta service xinetd restart
Stopping xinetd:
Starting xinetd:
OK OK
Note: You may also start your GridFTP server by the command below. globusgridftpserver S
For more information, see the following link:
http:www.globus.orgtoolkitdocs4.0datagridftpadminindex.html
178 Introduction to Grid Computing
Verifying the installation and configuration of GridFTP
To verify that GridFTP has been installed successfully, complete the following procedure:
1. Log in to your grid node with the user who has grid user certificates.
2. Type a gridproxyinit command to authenticate and create the proxy certificate.
3. TypethefollowingGridFTPclientcommandtomakesureyourGridFTPis configured properly see Example 1143.
globusurlcopy sourceURL destURL
Example 1143 Using GridFTP with globusurlcopy command
auser1hosta echo GridFTP Test tmpgridftptest.txt
auser1hosta globusurlcopy gsiftp:hostatmpgridftptest.txt
file:tmpgridftptestcopied.txt
auser1hosta cat tmpgridftptestcopied.txt
GridFTP Test
auser1hosta globusurlcopy file:tmpgridftptestcopied.txt
gsiftp:hostatmpgridftptestcopied2.txt
auser1hosta cat tmpgridftptestcopied2.txt
GridFTP Test
4. Try thirdparty transfer with the globusurlcopy command see Example 1144.
Example 1144 Thirdparty transfer with globusurlcopy command
auser1hosta echo ThirdParty GridFTP Test tmpthirdparty.txt
auser1hosta globusurlcopy gsiftp:hostatmpthirdparty.txt
gsiftp:hostbtmpthirdparty.txt
auser1hosta ssh buser1hostb
buser1hostbs password:
Last login: Thu Jun 9 19:36:31 2005 from hosta.redbook.ibm.com
buser1hostb cat tmpthirdparty.txt
ThirdParty GridFTP Test
buser1hostb ll tmpthirdparty.txt
rwrr 1 buser1 buser1 24 Jun 9 19:36 tmpthirdparty.txt
Important: In Example 1144, the owner of the created file is buser1 not root. This is because GridFTP uses GSI for authentication, and etcgridmapfile was used to map the grid user and local user. Take a look at etcgridsecuritygridmapfile. Refer to Security set up on page 168 and Chapter 7, Security on page 63, for more information.
Chapter 11. Globus Toolkit 4 installation and configuration 179
Note: In order to enable thirdparty GridFTP transfer, you need to install and configure other hosts, such as hostb, with the same steps. Refer to previous sections for installation and configuration procedures.
Troubleshooting
The following are some possible error conditions or symptoms that may come up in your testing along with possible resolutions.
It takes long time to transfer a small data file using globusurlcopy
Make sure your name server is configured properly. Look at etcresolv.conf
to make sure name resolution of your grid node is configured properly.
The following message appears during the globusurlcopy command.
globusgsigssapi: Error with gss credential handle
globuscredential: Valid credentials could not be found in any of the
possible locations specified by the credential search order.
Valid credentials could not be found in any of the possible locations
specified by the credential search order.
This is because you have tried to access the secured container without an activated proxy certificate. You need to run the gridproxyinit command in order to make this command work.
11.5.5 Configuration and testing of RFT
After you configure GridFTP, you may configure RFT. RFT is used by WS GRAM during stagein and stageout.
Setting up PostgreSQL
In order to use RFT, you need to configure a JDBCcompliant database. In this book, we install PostgreSQL as the database. Follow the procedures below to install and set up PostgreSQL.
1. If you do not have PostgreSQL in your node, install PostgreSQL with rpm commands as a root user see Example 1145.
Example 1145 Installing postgres
roothosta rpm ivh mntcdromRedHatRPMSpostgresqllibs7.3.23.i386.rpm
roothosta rpm ivh mntcdromRedHatRPMSpostgresql7.3.23.i386.rpm
Attention: Before configuring RFT, make sure you follow the instructions in Configuration and testing of GridFTP on page 177.
180 Introduction to Grid Computing
roothosta rpm ivh
mntcdromRedHatRPMSpostgresqlserver7.3.23.i386.rpm
2. Start the PostgreSQL server. Also, set postgresql to start after reboot see Example 1146.
Example 1146 Starting PostgreSQL
roothosta service postgresql start
Initializing database:
Starting postgresql service:
roothosta chkconfig postgresql on
OK OK
3. RFT requires PostgreSQL to accept a connection from the network. Edit the PostgreSQL settings file varlibpgsqldatapostgresql.conf to allow connection from the network see Example 1147.
Example 1147 Example of varlibpgsqldatapostgresql.conf
…unrelated information omitted
Connection Parameters
tcpipsocket true change the value from false to true …unrelated information omitted
4. Addthefollowinglinetovarlibpgsqldatapghba.conftoallowaccessfrom your host see Example 1148.
Example 1148 Example of varlibpgsqldatapghba.conf
…unrelated information omitted
local all all ident sameuser
host all all ip address of RFT host 255.255.255.255 trust
5. Restart the PostgreSQL server to activate the new settings see Example 1149.
Example 1149 Restarting PostgreSQL
roothosta service postgresql restart
Stopping postgresql service: OK
Starting postgresql service: OK
Adding a new database for RFT to PostgreSQL
You need to create a database that RFT uses in PostgreSQL. Follow the procedures below:
1. Asapostgresuser,submitthecommandinExample1150onpage182to create the RFT database.
Chapter 11. Globus Toolkit 4 installation and configuration 181
Example 1150 Creating RFT database
postgreshosta createdb rftDatabase
CREATE DATABASE
2. CreatetablesbyusingthesqlscriptsthatareincludedintheGlobusToolkit4 package see Example 1151.
Example 1151 Creating tables in RFT database
postgreshosta psql d rftDatabase f
GLOBUSLOCCATIONshareglobuswsrfrftrftschema.sql
CREATE SEQUENCE
CREATE SEQUENCE
psql:usrlocalglobus4.0.0shareglobuswsrfrftrftschema.sql:22: NOTICE:
CREATE TABLE PRIMARY KEY will create implicit index requestpkey for table
request
CREATE TABLE
psql:usrlocalglobus4.0.0shareglobuswsrfrftrftschema.sql:57: NOTICE:
CREATE TABLE PRIMAR
Y KEY will create implicit index transferpkey for table transfer
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE INDEX
3. Register the Globus user to PostgreSQL server see Example 1152.
Example 1152 Registering Globus user
postgreshosta createuser globus
Shall the new user be allowed to create databases? yn y
Shall the new user be allowed to create more new users? yn y
CREATE USER
Verifying the installation and configuration of RFT
To verify that the RFT has been installed successfully, complete the following procedure:
1. Log in to your grid node with the Globus user.
2. Start the secure container see Example 1153.
Example 1153 Starting the Java WS Core container in secure mode
globushosta globusstartcontainer
182 Introduction to Grid Computing
Note: The postgres user is automatically generated during the PostgreSQL package installation shown in Example 1145 on page 180.
Starting SOAP server at: https:192.168.1.103:8443wsrfservices
With the following services:
1: https:192.168.1.103:8443wsrfservicesTriggerFactoryService
2: https:192.168.1.103:8443wsrfservicesDelegationTestService …unrelated information omitted
51: https:192.168.1.103:8443wsrfservicesManagedJobFactoryService 20050609 21:46:49,805 INFO impl.DefaultIndexService Thread9,processConfigFile:99 Reading default registration configuration from file: usrlocalglobus4.0.0etcglobuswsrfmdsindexhierarchy.xml
3. In another console, log in to your grid node with the user who has grid user certificates.
4. Create an RFT description file. A sample RFT description file is show in Example 1154.
Example 1154 Sample of RFT description file rfttest.xfr
truebinary falseascii
true
Block size in bytes
16000
TCP Buffer size in bytes
16000
Notpt No thirdPartyTransfer
false
Number of parallel streams
1
Data Channel Authentication DCAU
true
Concurrency of the request
1
Grid Subject name of the source gridftp server
OGridOUGlobusTestOUsimpleCAca.redbook.ibm.comCNhosthosta.redbook.ibm.
com
Grid Subject name of the destination gridftp server
OGridOUGlobusTestOUsimpleCAca.redbook.ibm.comCNhosthostb.redbook.ibm.
com
Transfer all or none of the transfers
false
Maximum number of retries
10
Attention: Make sure you do not have an RFT error after you start your container. If you still have an RFT error like in Example 1135 on page 174, check the settings again. See Troubleshooting on page 184 for more information.
Chapter 11. Globus Toolkit 4 installation and configuration 183
SourceDest URL Pairs
gsiftp:hosta.redbook.ibm.comtmpfileInHostA.txt
gsiftp:hostb.redbook.ibm.comtmpfileFromHostA.txt
Note: A template for rfttest.xfr in Example 1154 is located in the file below: GLOBUSLOCATIONshareglobuswsrfrftclienttransfer.xfr
For more information, look at the following site:
http:www.globus.orgtoolkitdocs4.0datarftrn01re01.html
5. Run the RFT job by invoking the rft command, as below. Example 1155 Executing RFT file transfer with rft command
auser1hosta echo TestFromHostA tmpfileInHostA.txt
auser1hosta rft h hosta.redbook.ibm.com r 8443 f rfttest.xfr
Number of transfers in this request: 1
Subscribed for overall status
Termination time to set: 60 minutes
Overall status of transfer:
Overall status of transfer:
FinishedActiveFailedRetryingPending
FinishedActiveFailedRetryingPending
10000
01000
All Transfers are completed
auser1hosta ssh buser1hostb
buser1hostbs password:
buser1hostb cat tmpfileFromHostA.txt
TestFromHostA
Troubleshooting
To troubleshoot:
The following message appears during the globusstartcontainer command.
20050609 21:41:12,135 ERROR service.ReliableFileTransferImpl
main,init:73 Unable to setup database driver with pooling.A connection
error has occurred: FATAL: No pghba.conf entry for host
XXX.XXX.XXX.XXX, user globus, database rftDatabase
184 Introduction to Grid Computing
This message appears because PostgreSQL is not configured properly. Look at varlibpgsqldatapghba.conf and check if there is an entry for your host ip.
The following message appears during the globusstartcontainer command.
20050613 16:10:55,374 ERROR service.ReliableFileTransferImpl
main,init:73 Unable to setup database driver with pooling.Connection
refused. Check that the hostname and port are correct and that the
postmaster is accepting TCPIP connections.
This message appears because the container can not connect to PostgreSQL. Check whether PostgreSQL is running. Check the configuration as described in Setting up PostgreSQL on page 180.
The following message appears during rft command execution. Exception in thread main Error during startup processing. Caused by
java.lang.NumberFormatException: For input string:
Or:
Number of transfers in this request: 0
Exception in thread main Error during startup processing. Caused by
AxisFault
faultCode: http:schemas.xmlsoap.orgsoapenvelopeServer.userException
faultSubcode:
faultString: java.lang.NullPointerException
Or:
Exception in thread main Error during startup processing. Caused by
java.lang.ArrayIndexOutOfBoundsException: 10 10
These messages appear because the RFT description file rfttest.xfr is inconsistent. Check the RFT description file.
11.5.6 Configuration and testing of WS GRAM
Most of the settings for WS GRAM are completed automatically during the Globus Toolkit 4 installation process. You do need to set the environment for the sudo command.
Setting up WS GRAM environment
In order to configure WS GRAM, you need to configure the sudo command. Follow the procedures below:
1. Asarootuser,typeinthevisudocommandseeExample1156on page 186.
Chapter 11. Globus Toolkit 4 installation and configuration 185
Example 1156 Type visudo command
roothosta visudo
2. AddthelinesinExample1157intothefield.
Example 1157 Example of visudo entry
…unrelated information omitted
Globus GRAM entries
globus ALLauser1,auser2 NOPASSWD:
usrlocalglobus4.0.0libexecglobusgridmapandexecute g
etcgridsecuritygridmapfile
usrlocalglobus4.0.0libexecglobusjobmanagerscript.pl
globus ALLauser1,auser2 NOPASSWD:
usrlocalglobus4.0.0libexecglobusgridmapandexecute g
etcgridsecuritygridmapfile
usrlocalglobus4.0.0libexecglobusgramlocalproxytool
Note: All entries in Example 1157 should be in one line.
Verifying the installation and configuration of WS GRAM
To verify WS GRAM installation, complete the following procedure:
1. Log in to your grid node as the Globus user.
2. Start the secure container see Example 1153 on page 182.
3. In another console, log into your grid node with the user who has grid user certificates.
4. Run the command in Example 1158. If you find a file, then WS GRAM is configured properly.
Example 1158 Running simple WS GRAM command
auser1hosta globusrunws submit c bintouch tmpcreatedfile
Submitting job…Done.
Job ID: uuid:1b4e3cb2d96611d99f760011250d31d9
Termination time: 06112005 04:14 GMT
Current job state: Active
Current job state: CleanUp
Current job state: Done
Note: Do not forget to put a list of your local user names in the ALL clause, as shown in Example 1157.
186 Introduction to Grid Computing
Destroying job…Done.
auser1hosta ls tmpcreatedfile
tmpcreatedfile
Submitting a WS GRAM job using a job definition file
You may define a WS GRAM job with a job definition file.
Simple echo job definition file
Example 1159 shows a simple echo job definition file. In order to submit the WS GRAM job, type the commands as in Example 1160.
Example 1159 Example of echojob.xml
?xml version1.0 encodingUTF8?
job
executablebinechoexecutable
argumentThis file is written by WS GRAM job with job definition
file.argument
stdoutGLOBUSUSERHOMEstdoutstdout
stderrGLOBUSUSERHOMEstderrstderr
job
Example 1160 Submitting simple echo job with globusrunws command
auser1hosta globusrunws submit f echojob.xml
Submitting job…Done.
Job ID: uuid:2139fdcad9e611d9afb50011250d31d9
Termination time: 06112005 19:30 GMT
Current job state: Active
Current job state: CleanUp
Current job state: Done
Destroying job…Done.
auser1hosta cat stdout
This file is written by WS GRAM job with job definition file.
WS GRAM multiple job
Example 1161 on page 188 shows a multiple job definition that echoes a string to both host A and host B. In order to submit this WS GRAM job, type the commands as in Example 1162 on page 188.
Note: GLOBUSUSERHOME is a Globussupplied variable. For more information, look at the following link:
http:www.globus.orgtoolkitdocs4.0executionwsgramschemasgramjob
description.html
Chapter 11. Globus Toolkit 4 installation and configuration 187
Example 1161 Example of multijob.xml
?xml version1.0 encodingUTF8?
multiJob xmlns:gramhttp:www.globus.orgnamespaces200410gramjob
xmlns:wsahttp:schemas.xmlsoap.orgws200403addressing
job
factoryEndpoint
wsa:Addresshttps:hosta.redbook.ibm.com:8443wsrfservicesManagedJobFactory
Servicewsa:Address
wsa:ReferenceProperties
gram:ResourceIDForkgram:ResourceID
wsa:ReferenceProperties
factoryEndpoint
executablebinechoexecutable
argumentThis file is the first file written by WS GRAM job with
multiple job definition file.argument
stdoutGLOBUSUSERHOMEstdoutmulti1stdout
stderrGLOBUSUSERHOMEstderrmulti1stderr
count2count
job job
factoryEndpoint
wsa:Addresshttps:hostb.redbook.ibm.com:8443wsrfservicesManagedJobFactory
Servicewsa:Address
wsa:ReferenceProperties
gram:ResourceIDForkgram:ResourceID
wsa:ReferenceProperties
factoryEndpoint
executablebinechoexecutable
argumentThis file is the second file written by WS GRAM job with
multiple job definition file.argument
stdoutGLOBUSUSERHOMEstdoutmulti2stdout
stderrGLOBUSUSERHOMEstderrmulti2stderr
count1count
job
multiJob
Example 1162 Submitting multiple jobs with globusrunws command
auser1hosta globusrunws submit f multijob.xml J
Delegating user credentials…Done.
Submitting job…Done.
Job ID: uuid:08d97932dc1f11d99f2c0011250d31d9
Termination time: 06142005 15:23 GMT
Current job state: CleanUp
Current job state: Done
Destroying job…Done.
Cleaning up any delegated credentials…Done.
auser1hosta cat homeauser1stdoutmulti1
188 Introduction to Grid Computing
This file is the first file written by WS GRAM job with multiple job definition file.
This file is the first file written by WS GRAM job with multiple job definition file.
auser1hosta ssh buser1hostb
buser1hostbs password:
buser1hostb cat homebuser1stdoutmulti2
This file is the second file written by WS GRAM job with multiple job
definition file.
WS GRAM job with stage in and stage out
Example 1163 on page 190 shows a job definition with file stage in and file stage out. This job copies binecho binary to host B, executes the echo command with the copied echo binary in host B, copies output files from host B to host A, then cleans up the work files in host B. See Figure 112 for more details.
Note: J option in Example 857 is used to delegate the credentials to each GRAM host.
User This time, user is in hostA
WS GRAM job description file
Host A
2. File transfer request
3. File transfer of binecho GridFTP copied to tmpcopiedecho
5. File transfer of results GridFTP
Host B
4. Execute tmpcopiedecho
1. Submit SOAPhttps
6. CleanUp
stdoutfilestaging stderrfilestaging tmpcopiedecho
stdoutfilestaging stderrfilestaging
TCPIP network
Figure 112 Overview of file staging GRAM job
Chapter 11. Globus Toolkit 4 installation and configuration 189
Example 1163 Example of filestaging.xml
?xml version1.0 encodingUTF8?
job xmlns:gramhttp:www.globus.orgnamespaces200410gramjob
xmlns:wsahttp:schemas.xmlsoap.orgws200403addressing
factoryEndpoint
wsa:Addresshttps:hostb.redbook.ibm.com:8443wsrfservicesManagedJobFactor
yServicewsa:Address
wsa:ReferenceProperties
gram:ResourceIDForkgram:ResourceID
wsa:ReferenceProperties
factoryEndpoint
executabletmpcopiedechoexecutable
argumentStaging sample executed in hostb.argument
stdoutGLOBUSUSERHOMEstdoutfilestagingstdout
stderrGLOBUSUSERHOMEstderrfilestagingstderr
fileStageIn
transfer
sourceUrlgsiftp:hosta.redbook.ibm.combinechosourceUrl
destinationUrlgsiftp:hostb.redbook.ibm.comtmpcopiedechodestinationUrl
transfer
fileStageIn
fileStageOut
transfer
sourceUrlgsiftp:hostb.redbook.ibm.comGLOBUSUSERHOMEstdoutfilestagin
gsourceUrl
destinationUrlgsiftp:hosta.redbook.ibm.comtmpstdoutfromhostbdestinati
onUrl
transfer
transfer
sourceUrlgsiftp:hostb.redbook.ibm.comGLOBUSUSERHOMEstderrfilestagin
gsourceUrl
destinationUrlgsiftp:hosta.redbook.ibm.comtmpstderrfromhostbdestinati
onUrl
transfer
fileStageOut
fileCleanUp
deletionfilegsiftp:hostb.redbook.ibm.comtmpcopiedechofiledeletion
deletionfilegsiftp:hostb.redbook.ibm.comGLOBUSUSERHOMEstdoutfiles
tagingfile
deletion
deletionfilegsiftp:hostb.redbook.ibm.comGLOBUSUSERHOMEstderrfiles
tagingfile
190 Introduction to Grid Computing
deletion
fileCleanUp
job
In order to the submit WS GRAM job, type the commands shown in Example 1164.
Example 1164 Submitting file staging job with globusrunws command
auser1hosta globusrunws submit f filestaging.xml S
Delegating user credentials…Done.
Submitting job…Done.
Job ID: uuid:dd7e461adc3b11d9bc0b0011250d31d9
Termination time: 06142005 18:49 GMT
Current job state: StageIn
Current job state: Active
Current job state: StageOut
Current job state: CleanUp
Current job state: Done
Destroying job…Done.
Cleaning up any delegated credentials…Done.
auser1hosta cat tmpstdoutfromhostb
Staging sample executed in hostb.
auser1hosta cat tmpstderrfromhostb
auser1hosta
11.5.7 Testing of MDS4
The configurations of MDS4 are completed automatically during the Globus Toolkit 4 installation process. In order to test the MDS4 function, follow the procedure below:
1. Log in to your grid node as the Globus user.
2. Start a secure container.see Example 1153 on page 182.
3. In another console, log in to your grid node with the user who has grid user certificates.
4. Run the command shown in Example 1165. If you receive a list of services, then MDS4 is properly configured.
Example 1165 Using wsrfquery to obtain information from MDS4
auser1hosta wsrfquery s
https:hosta.redbook.ibm.com:8443wsrfservicesDefaultIndexService
Note: The S option shown in Example 1164 is used to delegate the credentials to each host during staging.
Chapter 11. Globus Toolkit 4 installation and configuration 191
ns0:IndexRP xmlns:gluehttp:mds.globus.orggluece1.1
xmlns:ns0http:mds.globus.orgindex
xmlns:ns1http:docs.oasisopen.orgwsrf200406wsrfWSServiceGroup1.2dra
ft01.xsd
xmlns:ns10http:docs.oasisopen.orgwsrf200406wsrfWSResourceLifetime1.
2draft01.xsd xmlns:ns2http:schemas
…unrelated information omitted
11.6 Uninstallation
In order to uninstall Globus Toolkit 4, follow the procedure below.
1. If you have running WS Core containers, stop them.
2. Asarootuser,deletethedirectorieslistedbelow.SeeExample1166for commands.
GLOBUSLOCATION usrlocalglobus4.0.0
etcgridsecurity
If you do not need Apache Ant ANTHOME usrapacheant1.6.3
Example 1166 Removing Globus directories
roothosta rm rf usrlocalglobus4.0.0
roothosta rm rf etcgridsecurity
roothosta rm rf usrapacheant1.6.3
3. If you have changed etcprofile in 11.5.1, Configuring environmental variables on page 168, remove the following lines:
export GLOBUSLOCATIONusrlocalglobus4.0.0
source GLOBUSLOCATIONetcglobususerenv.sh
source GLOBUSLOCATIONetcglobusdevelenv.sh
4. Remove the GridFTP service settings by removing the gsiftp 2811tcp line in etcservices.
5. Remove the GridFTP daemon settings. See Example 1167 for the commands.
Example 1167 Remove GridFTP settings
roothosta rm etcxinetd.dgsiftp
roothosta service xinetd restart
Stopping xinetd:
Starting xinetd:
OK OK
6. Remove the Globus user. See Example 1168 for the commands.
192 Introduction to Grid Computing
Example 1168 Removing Globus user
roothosta userdel r globus
7. Remove the two entries for the Globus user in etcsudoers by typing visudo
and editing the file.
8. If you do not need PostgreSQL, uninstall the following rpm packages:
postgresqllibs
postgresql
postgresqlserver
Example 1169 Removing postgres rpm files
rootzeta rpm e postgresqlserver
rootzeta rpm e postgresql
rootzeta rpm e postgresqllibs
9. If you do not need IBM Java SDK, uninstall the rpm package by issuing the following command Example 1170.
Example 1170 Removing IBM Java SDK
roothosta rpm e IBMJava2142ia32SDK
11.7 Summary
In this chapter we provided stepbystep instructions for setting up a grid in an environment based on Globus Toolkit 4. This environment is relatively basic and does not include all of the Globus Toolkit 4 components. However, it does provide a representative environment that can be used for selfeducation, testing, and creating a demonstration of certain grid capabilities.
In the next chapter, we describe a sample grid application that can be executed in the environment that we have just installed and configured.
Note: If you get dependency errors with Example 1169, remove packages that depend on postgresql packages.
Chapter 11. Globus Toolkit 4 installation and configuration 193
194 Introduction to Grid Computing
Grid demonstration application
Part 4
Part 4
Copyright IBM Corp. 2005. All rights reserved. 195
196 Introduction to Grid Computing
Chapter 12.
Demonstration application
This chapter describes a demonstration application built to explore some of the functionality provided by the Globes 4 Toolkit.
The application is a system that takes Scalable Vector Graphics SVG files see http:www.w3.orgTRSVG and uses nodes on a grid to render a set of JPEG files representing subimages of the complete image. As it is a demonstration system, certain design decisions and assumptions have been made to accelerate development.
The three components of the system are:
RenderClient: This is a Java application with a graphical interface for the user that drives the rendering work on the grid and displays the resulting subimages into a final large image. There is only one running in the grid.
RenderWorker: This is a Java application with no graphical user interface that converts one subimage of the SVG file into a JPEG file. There are one or more running on each node in the grid. Due to the strong parallelism inherent
Copyright IBM Corp. 2005. All rights reserved. 197
12
Important: The application as described below was built and tested in the environment described in Chapter 11, Globus Toolkit 4 installation and configuration on page 155. This application is provided as is and is intended as a learning tool for the reader. For information related to obtaining the sample applications source code and building the application, please refer to Appendix B, Additional material on page 231.
in rendering an SVG file to multiple JPEG subimages, the more nodes in the grid, the faster the SVG file will be fully rendered. You can run one or more RenderWorkers on each node, but depending on the available cycles and networking capabilities, you will reach a point of diminishing returns.
RenderSourceService: This is a Globus Toolkit 4 grid service, deployed into a Globus Toolkit 4 container. It is initialized by the RenderClient and hands out work instructions to RenderWorker processes on the grid. There is only one running in the grid.
Figure 121 Demonstration application architecture
We use the following Globus Toolkit 4 features in this demonstration application:
Grid service: A stateful Java class with methods using complex parameter passing and return objects
MDS: Registration and query of nodes participating in the virtual organization
Security: Grid proxies and certificates for secure execution of tasks and file
transfers
RFT GridFTP: Highperformance file transfers
GRAM: Staging all files required for the RenderWorker to the node, executing the RenderWorker, and staging back the resulting JPEG file
198 Introduction to Grid Computing
The following sections detail the design and implementation of each component.
Chapter 12. Demonstration application 199
12.1 RenderClient
The RenderClient is a large Java application using Swing classes to present the user interface of the SVG rendering system to the user.
12.1.1 The Graphical User Interface GUI
The GUI consists of several parts. Most of the text fields are prefilled with reasonable defaults in an effort to make it easier to use. Changing the default values requires editing and rebuilding the source code of the RenderClient.
Here is the overall screen layout. We examine each section in detail below.
Figure 122 The Full RenderClient screen
The left side of the screen holds a number of grouped text fields that allow the user to customize the work to be done and buttons to initialize and launch the rendering process. The right side displays the rendered JPEG files, tiled into a single full image.
200 Introduction to Grid Computing
Figure 123 SVG File Parameters
The user provides the host and path to the SVG file to be rendered. Currently the demonstration assumes the SVG file is on the same machine running the RenderClient. The user also gives an indication of the natural height and width of the file. You can enter estimated values or examine the SVG file contents for a line near the top of the file that looks similar to:
svg viewBox0 0 600 420 width600 height420
Figure 124 RenderSource Service Parameters
The user provides the full URL where the RenderSourceService is running in a Globus Toolkit 4 container. A suggested URL is prefilled, as the format of the URL is very specific. The desired width of the resulting full JPEG file is given. The RenderWorker will scale as appropriate to give this resulting size, while the aspect ratio of the SVG file will be preserved in the JPEG file. Finally, the user specifies how many boxes wide and high the SVG will be broken into for rendering. The number of RenderWorkers launched will be boxes wide boxes high. A rule of thumb is to specify boxes high and boxes wide in proportion to the number of nodes in the virtual organization.
Chapter 12. Demonstration application 201
Figure 125 RenderWorker Application File Locations
The user provides the host and path to the location of all files required to run RenderWorker on the nodes. Currently, the demonstration application assumes that the host holding the RenderWorker files is running a Globus Toolkit 4 container, or at least the RFT service.
Figure 126 Rendered JPEG File Parameters
The user provides the host and desired path where the rendered JPEG files will be placed. The demonstration application currently assumes that the host will always be the host running the RenderClient.
Figure 127 Virtual Organization
The user provides the full URL to a DefaultIndexService running in a GT4 container that is defined as the root node of the virtual organization. The URL is queried to get a list of all nodes in the VO.
Figure 128 Prepare Grid button
202 Introduction to Grid Computing
When all of the above fields are filled in, clicking Prepare Grid initializes the RenderSourceService, queries the virtual organization, and builds a list of nodes in the virtual organization in the following section of the graphical interface.
Figure 129 Grid nodes before clicking Prepare Grid
The Copy files to remote nodes box can be checked to force the staging of all files required by the RenderWorker to each node. Since this takes a nontrivial amount of time and network bandwidth, the user can uncheck this box if each node has already had the latest version of all files staged at least once.
Figure 1210 Grid nodes after clicking Prepare Grid
After the Prepare Grid button is clicked, this section of the GUI shows the list of all nodes in the virtual organization. Individual subimages can be rendered by clicking the Go button to the right of a host name. If the user wants to start the rendering of the remaining subimages, nodes can be selected or deselected for work.
Chapter 12. Demonstration application 203
After the user clicks Go Selected, the RenderClient provides a visual indication of the current state of the process on each worker node. The full image area is drawn with lines showing how the SVG file will be broken into subimages and rendered by worker nodes on the grid. Text and color coding is used to track each jobs progress.
There are six possible states that a job can be in on this demonstration grid:
Preparing Pushing
Rendering Retrieving Complete Failed
Each state and its visual representation is described below.
Preparing
Figure 1211 Preparing job state
Preparing is the first job state entered after the user clicks Go Selected. The application creates and prepares a Globus JobDescriptionType object. This object defines everything required to prepare and execute a program on a remote system. The client then creates and initializes a GramJob object, which is used to pass the JobDescriptionType object to the Globus GRAM subsystem. Calling GRAMs submit method launches the request to run the program on the remote system. A listener is added to the GRAM job to catch the notifications for changes in status, which drive the state changes shown on the screen.
Note that in this demonstration application each GRAM job is launched in its own thread, allowing multiple simultaneous GRAM jobs to be launched and monitored for state changes.
From a problem determination standpoint, several things can go wrong during this stage. Below are a few of the problems we encountered while developing and testing this application:
Cannot communicate with the ManagedJobFactoryService running in the Globus container on the worker node. We found that some systems were
204 Introduction to Grid Computing
silently running iptables, which blocked communications, so all systems must permanently turn off iptables or do etcinit.diptables stop after each reboot.
Incorrect parameters defined in the JobDescriptionType and GramJob objects.
To enable debugging and tracing, edit usrlocalglobus4.0.0containerlog4j.properties. Comment or uncomment lines at the bottom of the file to get more information for GRAM and RFTGridFTP.
Pushing
Figure 1212 Pushing files to worker node state
Pushing is the term used by the RenderClient for GRAMs StageIn state. StageIn means copying the list of requested files from their source on the network to the destination worker node. Note that any number of files can be designated for staging in, and they can be sourced from different servers on the network. The only caveat is that GRAM must be able to find the file on the attached storage via the file: Protocol or via a Globus container or RFTGridFTP server via the gsiftp: protocol.
Several things can potentially go wrong during this stage, including:
Not being able to locate the files to be staged in.
Destination directory on the worker node does not existGRAM will not automatically create a directory path for you.
Insufficient privilege to write the file to its destination location on the worker node.
When running multiple RenderWorkers on a single worker node, GRAM is not smart enough to only copy the files once. We have seen the JVM crash when a JAR file was in the process of being overwritten by one job while being used by another job.
Chapter 12. Demonstration application 205
Rendering
Figure 1213 Rendering Image on worker node state
Rendering is the term used for the RenderClient for GRAMs active state. Active means the program defined in the JobDescriptionType object is about to be launched or is currently executing.
Several things can potentially go wrong during this stage, including:
The command line defined in the JobDescriptionType failed.
The programs environment is not fully set up, including JAVAHOME, GLOBUSLOCATION, CLASSPATH, LIBPATH, and LDLIBRARYPATH.
Globus security fails due to nonexistent or expired proxy credentials.
Insufficient privilege to create the programs output files.
Programspecific failures. In the case of this demo, the Batik library requires communication with the machines X Windows server, so we had to set the DISPLAY environment variable. Also, an administrator must do xhost on each worker node or Batik cannot open the X Windows DISPLAY to run its code. An alternative is to do xhost on a single node and modify the shell scripts DISPLAY variable to point to that host.
It is a good idea to build your target application such that it can be run and debugged by hand on a worker node, minimizing the amount of effort once you get to the GRAM part of your development and testing.
The most important thing we discovered for this state is that even though the job is run under the auspices of a certain user ID on the worker node, for example, globus, the program does not inherit any environment that the user would normally see if logged into a system with that user ID. In the end we had to provide a complete environment setup for the program to run, so we made a shell script, called runrenderworker, that performed all environment setup, and then as the last step launched the target Java application, RenderWorker. This has the benefit of getting RenderWorker to run successfully on the worker nodes, but the detriment is the directory paths to all required software components Java runtime, Globus installation, system libraries, and so on had to be hardcoded in the shell script. Therefore, all worker nodes had to be identically configured to allow for successful execution across the grid.
206 Introduction to Grid Computing
Retrieving
Figure 1214 Retrieving image from worker node state
Retrieving is the term used for the RenderClient for GRAMs StageOut state. StageOut occurs after the program finishes execution and pulls any requested files back from the worker node to another location on the network. For our demonstration, the generated JPEG file was pulled back to the machine running the RenderClient application.
Several things can potentially go wrong during this stage, including:
Files expected to exist for StageOut do not exist due to an incorrect definition in the JobDescriptionType object or failure of the program.
Insufficient privilege to copy the programs output files to the destination system.
Complete
Figure 1215 Job complete state
Complete is the term used for the RenderClient for GRAMs done state. Complete occurs after files are staged out to the destination system. Our demo application located the rendered JPEG file by its expected file name and displays it at the proper coordinates in the Image Results part of the screen.
Several things can potentially go wrong during this stage, including:
The expected JPEG file does not exist, due to a failure described above.
The JPEG file is actually not in JPEG format due to an error with the RenderWorker.
Chapter 12. Demonstration application 207
Failed
Figure 1216 Job failed state
Failed is the term used for the RenderClient for GRAMs failed state. A job can fail for any of the reasons described above, at any state of the process. In the demo application we try to catch as many failure modes as possible to give the user a hint as to what failed. Failures are unusual once a system goes into production, but during development GRAM can fail in several ways. There are several places to look for help, both information messages and error messages, when something goes awry:
The output of the Globus container where the RenderClient is running. You may see security failurerelated errors, file StageIn and StageOut transfer errors, and so on.
The terminal window where you executed the RenderClient. The demo application emits many messages when an error is determined. You can also uncomment desired System.out.println lines and rebuild the RenderClient to see trace information.
The output of the Globus container where the RenderSourceServce is running. You may see security failurerelated errors and grid service API errors. You can also uncomment desired System.out.println lines and rebuild the RenderSourceService to see trace information.
The contents of the StageIn target directory on the worker node will show if all files are being successfully staged in.
The stdout and stderr files created by GRAM for your remote application. Messages from both the shell script and RenderWorker are shown. You can also uncomment desired System.out.println lines and rebuild the RenderWorker to see trace information.
The contents of the StageOut target directory on the system running RenderClient will show if all JPEG files are being successfully staged out.
208 Introduction to Grid Computing
View of successful completion of job
Figure 1217 Resulting image area
The entire right side of the GUI window is reserved for the resulting subimage JPEGs.
12.1.2 RenderClient source code
Attention: This demonstration application is available to be downloaded as described in Appendix B, Additional material on page 231. Detailed descriptions of the source code and how to develop grid applications using the Globus Toolkit are beyond the scope of this book. However, some information is provided below for those interested in modifying or adapting this code for their unique environments.
The source code to the RenderClient application is defined in two classes, RenderClient.java and GRAMLocator.java.
Chapter 12. Demonstration application 209
We highlight some interesting aspects of the RenderClient application below:
A special inner class called SVGImage is used to maintain the state of each of the subimages.
When the Prepare Grid button is clicked, the GramLocator object is used to query all nodes in the virtual organization and populate the list on the screen. Then the internal state of image dispatching is reset and the area for the resulting JPEG images is cleared and segmented. Finally, the createRenderSourceInstance method uses Globus facilities to locate the RenderSourceService and call its reset, setSVGParams, and setRenderParams methods.
When the Go Selected button is clicked, a check is made to make sure at least one node is selected for processing. Then each subimage is dispatched to each selected worker node in roundrobin fashion via threads.
If one of the Go buttons is clicked, the next nondispatched subimage is dispatched to the corresponding worker node via a thread.
The WorkerDispatch class handles the actual dispatching of each subimage rendering job. Doing this in a thread allows the parallel execution and asynchronous status update of each subimage. The run method is used to set up the job via the Globus JobDescriptionType class. JobDescriptionType is a complex class that is used to set up all aspects of a task to be executed on a remote node. Normally, a subset of the full object setting will get the job done, but you may need to set more fields, depending on your projects requirements.
Environment variables
Working directory
Executable name
Executable parameters
Location of stdout and stderr files created by the remote task
The source and destination of all files to be transferred to the remote node are also defined in the JobDescriptionType object. Normally, the file description is in URL format starting with gsiftp:… signifying the Globus RFTGridFTP facilities are to be used to perform the transfers. Note that our demonstration application requires 18 files to be staged to the remote node:
A shell script, called runrenderworker, that sets up the full environment Java and Globus locations, CLASSPATH, LIBPATH, and others, runs gridproxyinit, and then launches the RenderWorker java application
A Jar file containing the RenderWorker application
A Jar file containing the stub classes required to communicate with the RenderSourceService
210 Introduction to Grid Computing
Thirteen Jar files from the Apache Batik project, which contains the classes that actually convert SVG files to JPEG files
The SVG file whose subimages are to be rendered into a JPEG file
Source and destination of the JPEG file to be transferred back to the system running the RenderClient called Staging In
When the JobDescription is fully set up, a GramJob object is created and used to submit the job to the Globus GRAM facility.
A listener is added to the GramJob that catches all changes in the jobs state, as reported by GRAM. At each state change, the graphical interface is updated with color coding and status messages. When GramJob returns, the job has completed, either succeeding or failing, so the code attempts to load and display the JPEG file that was expected to be returned.
A special inner class called SVGImagePanel is used to force the proper repainting of each subimage on the screen.
A helper class called GRAMLocator is used to query the root node of the virtual organization and return a list of nodes registered in that virtual organization.
12.2 RenderWorker
The RenderWorker is a standalone Java application with no GUI that is launched by GRAM on a worker node. It is a single Java file called RenderWorker.java.
It creates a JPEG file corresponding to a given subimage of the SVG file and exits. This is a design choice we made in order to make the design and execution of the RenderWorker straightforward.
When launched, it checks that its two required parameters are available. The first parameter is the full URL of the RenderSourceService and the second parameter is the subimage number that it should render.
It uses Globus methods to connect to the RenderSourceService and call the getWork method. getWork returns a Java class that describes everything the RenderWorker needs to know to create a JPEG image, including the rectangle of interest in the SVG file, the size of the resulting JPEG file, and the names of the SVG and the JPEG file.
Note that the source code uses a number of Java classes that are generated from the WSDL definition of the RenderSourceServices grid service API, which is standard procedure when building and working with Globus 4 grid services. The most important object is GetWorkResponse, a compound generated Java
Chapter 12. Demonstration application 211
object that contains all of the information needed for the RenderWorker to do its job.
The code uses the Apache Batik librarys JPEGTranscoder class to set up the proper parameters for the generation of the JPEG image, then generates the JPEG file.
One thing you might notice in the code is the following:
static
Util.registerTransport;
This is the prescribed way to avoid a No socket factory for https error in any Globus code working in secure mode.
12.3 RenderSourceService
The RenderSourceService is a straightforward Globus Toolkit 4 grid service that maintains an internal state and has several public methods. Its purpose is to hand out subimage chunks of work when a RenderWorker somewhere on the grid makes a request.
The service currently has several Globus 4 resources defined, but we do not currently exploit any Globus Toolkit 4 resource facilities in this application.
The service is set up to start an instance when the container it is deployed into is started. It calls its own reset method to initialize the internal state, where none of the subimages have been marked as dispatched to remote nodes.
The setSVGParams and setRenderParams methods are called by the RenderClient to pass the usersupplied SVG and JPEG parameters and prepare for dispatch of a new series of RenderWorkers.
When a RenderWorker calls getWork, the service takes the subimage number passed in, calculates the image coordinates, builds an object describing the work to be done, and hands it back to the RenderWorker.
Note that all methods along with their input and output parameters are defined in WSDL so other applications on the grid can call the methods and interpret the return values properly.
212 Introduction to Grid Computing
12.3.1 Alternative architecture
As they stand today, the RenderClient and RenderSourceService share some of the tasks of dispatching work to the nodes; specifically, they must stay in sync with which subimages have been dispatched. In hindsight, it would have been better to give the RenderSourceService full scheduler responsibility.
The RenderClient would continue to collect the parameters from the user and pass them to the RenderSourceService as is done today. The design would change so when the user clicks Go or Go Selected, the RenderClient would simply ping the RenderSourceService. The RenderSourceService would do all job creation, file staging, and fire off the RenderWorkers. The RenderClient would register for any job state changes and gain access to the resulting JPEG file via Globus Toolkit 4s Resource facility.
12.4 DirectoryTree of important files in demo
Note: In the various scripts and other files that are described below, we have hard coded many of the directory paths based on our specific environment. This constrains us to having identical environments and directory structures on each of our grid nodes. This may not be a practical constraint in many environments. More complex scripts and the use of environment variables or parameters may be required.
top directory of our project
buildclient: batch file to build RenderClient
source GLOBUSLOCATIONetcglobusdevelenv.sh
javac classpath .buildstubsclasses:CLASSPATH comibmredbookgridintrorenderclientsRenderClient.java javac classpath .buildstubsclasses:CLASSPATH comibmredbookgridintrorenderclientsGRAMLocator.java
buildworker: batch file to build RenderWorker
source GLOBUSLOCATIONetcglobusdevelenv.sh
javac classpath batiktranscoder.jar:.buildstubsclasses:CLASSPATH comibmredbookgridintrorenderworkerRenderWorker.java jar cf RenderWorker.jar comibmredbookgridintrorenderworkerRenderWorker.class
buildservice: batch file to build RenderSourceService
export GLOBUSLOCATIONusrlocalglobus4.0.0
.globusbuildservice.sh d comibmredbookgridintrorender s schemagridintroRenderSourceServiceinstanceRenderSourceService.wsdl copy up the generated jar files to the main directory
cp buildlibcomibmredbookgridintrorender.jar .
cp buildlibcomibmredbookgridintrorenderstubs.jar .
deployservice: batch file to deploy RenderSourceService into a GT4 container
usrlocalglobus4.0.0binglobusdeploygar comibmredbookgridintrorender.gar
Chapter 12. Demonstration application 213
undeployservice: batch file to undeploy RenderSourceService from a GT4 container
usrlocalglobus4.0.0binglobusundeploygar comibmredbookgridintrorender
startcontainer: batch file to start a GT4 container
usrlocalglobus4.0.0binglobusstartcontainer
runclient: batch file to run RenderClient
source GLOBUSLOCATIONetcglobusdevelenv.sh
rm rf tmp.jpg
java DGLOBUSLOCATIONGLOBUSLOCATION classpath .buildstubsclasses:CLASSPATH
comibmredbookgridintrorenderclientsRenderClient http:127.0.0.1:8080wsrfservicesrenderRenderService
runworker: batch file to run RenderWorker in standalone mode for testing
.runrenderworker https:192.168.1.111:8443wsrfservicesrenderRenderSourceService 1
build.xml: build files provided by the Globus Service Build Tools project http:gsbt.sourceforge.net, unchanged by our work
globusbuildservice.sh: build file for Globus services, provided by the Globus Service Build Tools, unchanged by our work.
clientconfig.wsdd: configuration file used as input to the build process, provided by the Globus Service Build Tools, unchanged by our work
namespace2package.mappings: file that maps abstract namespaces of our project to concrete class names implementing the service
http:www.globus.orgnamespacesrender.gridintro.redbook.ibm.comRenderSourceServiceinstancec om.ibm.redbook.gridintro.render.stubs.RenderSourceServiceinstance http:www.globus.orgnamespacesrender.gridintro.redbook.ibm.comRenderSourceServiceinstanceb indingscom.ibm.redbook.gridintro.render.stubs.RenderSourceServiceinstance.bindings http:www.globus.orgnamespacesrender.gridintro.redbook.ibm.comRenderSourceServiceinstances ervicecom.ibm.redbook.gridintro.render.stubs.RenderSourceServiceinstance.service
comibmredbookgridintrorender.gar: Grid Archive file containing the RenderSourceService, ready for deployment. This is the output file of the buildservice process.
runrenderworker: batch file that is the main executable, staged to the worker nodes.
!binbash
this script is pushed to the remote node and executed to run the RenderWorker application two arguments: URL of RenderSourceService and block number to render
echo Running RenderWorker kickoff script
delete stdout and stderr to assist with debugging GRAM appends every job rm f stdout
rm f stderr
set up the path to the Java runtime export JAVAHOMEusrjavaj2sdk1.4.208
214 Introduction to Grid Computing
echo JAVAHOME is JAVAHOME
set up the path to the globus install
export GLOBUSLOCATIONusrlocalglobus4.0.0 echo GLOBUSLOCATION is GLOBUSLOCATION
run the globus script to set up the classpath with all of the globus jars echo sourcing globus environment in GLOBUSLOCATION
source GLOBUSLOCATIONetcglobusdevelenv.sh
echo sourced globus environment
add render demo specific jars to classpath
echo adding required worker jars
export CLASSPATHRenderWorker.jar:comibmredbookgridintrorenderstubs.jar:batik.jar:batikdom.jar:bat iksvgdom.jar:batikcss.jar:batikrasterizer.jar:batiktranscoder.jar:batikbridge.jar:batikgvt .jar:batikutil.jar:batikext.jar:batikxml.jar:batikscript.jar:batikawtutil.jar:batikparser. jar:CLASSPATH
echo final classpath is CLASSPATH
set up the path to the Globus and system libraries export LIBPATHusrlocalglobus4.0.0lib:usrlib:lib echo LIBPATH is LIBPATH
set up another path to Globus libraries
export LDLIBRARYPATHusrlocalglobus4.0.0lib echo LDLIBRARYPATH is LDLIBRARYPATH
having problems with credentials on each worker node, so manually ran gridproxyinit hours 100000 to avoid
set up proper user for security credentials for this application
echo setting X509USERPROXY
IDusrbinid u X509USERPROXYtmpx509upuID
echo set X509USERPROXY to X509USERPROXY
request security credentials for this application echo doing gridproxyinit GLOBUSLOCATIONbingridproxyinit
echo did gridproxyinit
for some reason the Batik library needs to have X Windows DISPLAY variable set this has to be done to the machine before this script is run
echo setting DISPLAY and running xhost
export DISPLAY192.168.1.103:0.0
export DISPLAYlocalhost:0.0 usrX11R6binxhost
now execute the application with the passed in URL to the RenderSourceService echo launching RenderWorker class with parameter 1
JAVAHOMEbinjava comibmredbookgridintrorenderworkerRenderWorker 1 2 echo completed RenderWorker, exiting
RenderWorker.jar: jar file containing the compiled RenderWorker java class. batikawtutil.jar: Apache Batik jar files, staged to the worker nodes.
batikbridge.jar batikcss.jar
batikdom.jar
batikext.jar
batikgvt.jar
batik.jar
batikparser.jar
Chapter 12. Demonstration application 215
batikrasterizer.jar batikscript.jar
batiksvgdom.jar
batiktranscoder.jar batikutil.jar
batikxml.jar
mapSpain.svg: test SVG file provided by the Apache Batik project, staged to the
worker nodes
comibmredbookgridintrorenderstubs.jar: jar file generated by the Globus service build process, containing Java objects for RenderSourceService methods, staged to the worker nodes.
Note that the tree for the Java classes may follow the package naming convention you choose, but the other configuration files must be placed in very specific places in order to be found and processed during the build and deployment stages.
.comibmredbookgridintrorender:
deployjndiconfig.xml: instructions to the container about how to deploy the RenderSourceService
?xml version1.0 encodingUTF8 ?
jndiConfig xmlnshttp:wsrf.globus.orgjndiconfig
service namerenderRenderSourceService
resource namehome typeorg.globus.wsrf.impl.ServiceResourceHome resourceParams
parameter
namefactoryname
valueorg.globus.wsrf.jndi.BeanFactoryvalue
parameter
resourceParams
resource
service jndiConfig
deployserver.wsdd: instructions to the container about how to deploy the RenderSourceService
?xml version1.0 encodingUTF8? deployment namedefaultServerConfig
xmlnshttp:xml.apache.orgaxiswsdd xmlns:javahttp:xml.apache.orgaxiswsddprovidersjava xmlns:xsdhttp:www.w3.org2001XMLSchema
service namerenderRenderSourceService providerHandler useliteral styledocument
parameter nameclassName valuecom.ibm.redbook.gridintro.render.impl.RenderSourceService wsdlFileshareschemagridintroRenderSourceServiceinstanceRenderSourceServiceservice.wsdlwsdlFile parameter nameallowedMethods value
parameter namehandlerClass valueorg.globus.axis.providers.RPCProvider
parameter namescope valueApplication
parameter nameproviders valueGetRPProvider
parameter nameloadOnStartup valuetrue
service deployment
.comibmredbookgridintrorenderclients: 216 Introduction to Grid Computing
RenderClient.java: source file for RenderClient application
GRAMLocator.java: source file for RenderClient application
.comibmredbookgridintrorenderimpl:
RenderSourceService.java: source file for RenderSourceService GT4 service
RenderSourceServiceQNames.java: source file for RenderSourceService GT4 service
.comibmredbookgridintrorenderworker:
RenderWorker.java: source file for RenderWorker application
.schemagridintroRenderSourceServiceinstance:
RenderSourceService.wsdl: Web Services Description Language WSDL file defining methods and parameters for RenderSourceService
?xml version1.0 encodingUTF8? definitions nameRenderSourceService
targetNamespacehttp:www.globus.orgnamespacesrender.gridintro.redbook.ibm.comRenderSourceServiceinstance xmlnshttp:schemas.xmlsoap.orgwsdl xmlns:tnshttp:www.globus.orgnamespacesrender.gridintro.redbook.ibm.comRenderSourceServiceinstance xmlns:wsdlhttp:schemas.xmlsoap.orgwsdl xmlns:wsrphttp:docs.oasisopen.orgwsrf200406wsrfWSResourceProperties1.2draft01.xsd xmlns:wsrpwhttp:docs.oasisopen.orgwsrf200406wsrfWSResourceProperties1.2draft01.wsdl xmlns:wsdlpphttp:www.globus.orgnamespaces200410WSDLPreprocessor xmlns:xsdhttp:www.w3.org2001XMLSchema
wsdl:import namespace
http:docs.oasisopen.orgwsrf200406wsrfWSResourceProperties1.2draft01.wsdl location….wsrfpropertiesWSResourceProperties.wsdl
! TYPE S types
xsd:schema targetNamespacehttp:www.globus.orgnamespacesrender.gridintro.redbook.ibm.comRenderSourceServiceinstance
xmlns:tnshttp:www.globus.orgnamespacesrender.gridintro.redbook.ibm.comRenderSourceServiceinstance xmlns:xsdhttp:www.w3.org2001XMLSchema
! REQUESTS AND RESPONSES
xsd:element namereset xsd:complexType
xsd:element
xsd:element nameresetResponse
xsd:complexType xsd:element
xsd:element namesetSVGParams xsd:complexType
xsd:sequence
xsd:element nameuriSVG typexsd:string xsd:element namesvgDocWidth typexsd:int xsd:element namesvgDocHeight typexsd:int
xsd:sequence
xsd:complexType xsd:element
xsd:element namesetSVGParamsResponse
Chapter 12. Demonstration application 217
218 Introduction to Grid Computing
xsd:complexType xsd:element
xsd:element namesetRenderParams xsd:complexType
xsd:sequence xsd:element xsd:element xsd:element xsd:element xsd:element
xsd:sequence
nameuriJPEG typexsd:string nameimageWidth typexsd:int nameimageHeight typexsd:int nameblocksWide typexsd:int nameblocksHigh typexsd:int
xsd:complexType xsd:element
xsd:element namesetRenderParamsResponse xsd:complexType
xsd:element
xsd:element namegetWork typexsd:int xsd:element namegetWorkResponse
xsd:complexType xsd:sequence
xsd:element xsd:element xsd:element xsd:element xsd:element xsd:element xsd:element xsd:element xsd:element xsd:element xsd:element xsd:element xsd:element
xsd:sequence
nameuriSVG typexsd:string nameuriJPEG typexsd:string nameblockNumber typexsd:int namenumBlocksWide typexsd:int namenumBlocksHigh typexsd:int namesvgBlockX typexsd:int namesvgBlockY typexsd:int namesvgBlockWidth typexsd:int namesvgBlockHeight typexsd:int nameimageBlockX typexsd:int nameimageBlockY typexsd:int nameimageBlockWidth typexsd:int nameimageBlockHeight typexsd:int
xsd:complexType xsd:element
xsd:element namegetLastJPEGRP xsd:complexType
xsd:element
xsd:element namegetLastJPEGRPResponse typexsd:string
! RESOURCE PROPERTIES
xsd:element nameLastJPEG typexsd:string
xsd:element nameRenderResourceProperties xsd:complexType
xsd:sequence
xsd:element reftns:LastJPEG minOccurs1 maxOccurs1
xsd:sequence
xsd:complexType xsd:element
xsd:schema types
! MESSAGE S message nameResetInputMessage
part nameparameters elementtns:reset message
message nameResetOutputMessage
part nameparameters elementtns:resetResponse
message
message nameSetSVGParamsInputMessage
part nameparameters elementtns:setSVGParams
message
message nameSetSVGParamsOutputMessage
part nameparameters elementtns:setSVGParamsResponse message
message nameSetRenderParamsInputMessage
part nameparameters elementtns:setRenderParams
message
message nameSetRenderParamsOutputMessage
part nameparameters elementtns:setRenderParamsResponse message
message nameGetWorkInputMessage
part nameparameters elementtns:getWork
message
message nameGetWorkOutputMessage
part nameparameters elementtns:getWorkResponse message
message nameGetLastJPEGRPInputMessage
part nameparameters elementtns:getLastJPEGRP
message
message nameGetLastJPEGRPOutputMessage
part nameparameters elementtns:getLastJPEGRPResponse message
! PORTTYP E portType nameRenderSourceServicePortType
wsdlpp:extendswsrpw:GetResourceProperty
wsrp:ResourcePropertiestns:RenderResourceProperties operation namereset
input messagetns:ResetInputMessage
output messagetns:ResetOutputMessage operation
operation namesetSVGParams
input messagetns:SetSVGParamsInputMessage output messagetns:SetSVGParamsOutputMessage
operation
operation namesetRenderParams
input messagetns:SetRenderParamsInputMessage
output messagetns:SetRenderParamsOutputMessage operation
operation namegetWork
input messagetns:GetWorkInputMessage output messagetns:GetWorkOutputMessage
operation
operation namegetLastJPEGRP
input messagetns:GetLastJPEGRPInputMessage
output messagetns:GetLastJPEGRPOutputMessage operation
portType definitions
There are also a significant number of files created under the build directory as a result of the build process. Luckily, the build process is highly automated and you should not need to worry about anything in here. You can safely delete the entire build tree and rerun all build scripts.
Some of the products of the build process are .java source files, which are then compiled and used later in the build process. These tend to be the Java definition of resources and helper classes for complex objects passed in and back from Globus service methods. These source files can be useful to track down strange compiler errors or runtime bugs dealing with calling these methods.
Doublecheck the spelling of all method and parameter variable names. Remember that by convention a parameter called value will have corresponding methods called getValue and setValue.
Chapter 12. Demonstration application 219
It is always better to take an existing file and make slight modifications rather than trying to write one from scratch, for example, namespace2package.mappings, deployjndiconfig.xml, deployserver.wsdd, QNames.java, .wsdl.
220 Introduction to Grid Computing
Appendixes
Part 5
Part 5
Copyright IBM Corp. 2005. All rights reserved. 221
222 Introduction to Grid Computing
Appendix A.
IBM software portfolio for grid computing
This appendix provides a short summary of some IBM software that has particular application for grid environments.
Copyright IBM Corp. 2005. All rights reserved. 223
A
IBM Application Workload Modeler
This can help you allocate existing system resources more efficiently by modeling, generating real traffic on your network, and evaluating the network performance of existing workloads.
IBM CloudscapeApache Derby
IBM CloudscapeTM V10.0 is a pure, open sourcebased Java relational database management system that can be embedded in Java programs and used for online transaction processing OLTP. A platformindependent, smallfootprint 2MB database, Cloudscape V10.0 integrates tightly with any Javabased solution. It has been donated to the Apache Software Foundation and is now named Derby.
DB2 Connect Family
DB2 ConnectTM connects LANbased systems and their desktop applications to your companys mainframe and minicomputer host databases. Designed to address the needs of organizations that require robust connectivity from a variety of desktop systems including workgroupdepartmental and LANbased systems to mainframes and iSeriesTM database servers.
DB2 Everyplace Family
This creates secure embedded mobile data management solutions easily using the DB2 Everyplace Database. Use industry standard SQL to store and query data in the highperformance, small footprint relational database.
DB2 Universal Database Family
This is the premier IBM database and data management products.
Mathematical Acceleration Subsystem MASS
Mathematical Acceleration Subsystem consists of libraries of tuned mathematical intrinsic functions, available in versions for the AIX and Linux platforms. MASS libraries offer improved performance over the standard
224 Introduction to Grid Computing
mathematical library routines, are threadsafe, and support compilations in C, C, and Fortran applications.
Rational Application Developer for WebSphere Software
Quickly design, develop, analyze, test, profile, and deploy Web, Web services, Java, J2EE, and portal applications with this comprehensive IDE. Optimized for IBM WebSphere software, and supporting multivendor runtime environments, IBM Rational Application Developer for WebSphere Software is powered by the Eclipse open source platform so developers can adapt and extend their development environment to match their needs and increase their productivity. When used with the IBM Software Development Platform, developers can access a broad range of requirements and change management functions directly from Rational Application Developer for WebSphere Software. Adapt and extend your development environment with Eclipsebased plugins to match your needs.
IBM Tivoli Access Manager Family
IBM Tivoli Access Manager is an award winning, policybased access control solution for ebusiness and enterprise applications that is in the leader quadrant of Gartners Magic Quadrant. Tivoli Access Manager for ebusiness can help you manage growth and complexity, control escalating management costs, and address the difficulties of implementing security policies across a wide range of Web and application resources.
IBM Tivoli Configuration Manager
IBM Tivoli Configuration Manager provides the ability to capture your best practices for software distribution, automate those best practices, and enforce corporate standards. It helps you gain total control over your heterogeneous enterprise software and hardware.
IBM Tivoli Enterprise Console
IBM Tivoli EnterpriseTM Console provides sophisticated, automated problem diagnosis and resolution to improve system performance and reduce support costs. The new enhancements focus on time to value and ease of use with outofthebox best practices to simplify and accelerate deployment. The
Appendix A. IBM software portfolio for grid computing 225
autodiscovery feature allows you to understand the environment and process events appropriately. The Web console enhances visualization while providing remote access to events and console operations.
IBM Tivoli Intelligent Orchestrator
IBM Tivoli Intelligent Orchestrator helps you to improve return of IT assets and increase server utilization. It helps boost servertoadministrator ratios by automatically triggering the provisioning, configuration, and deployment of a solution into production. This automated process supports servers, operating systems, storage, middleware, applications, and network devices. IBM Tivoli Intelligent Orchestrator extends the benefits of the IBM Tivoli Provisioning Manager. It intelligently and dynamically issues instructions to Tivoli Provisioning Manager, which then uses automation packages to maintain server availability and meet required service levels in accordance with business priorities. It provides the why, where, and when of a complete orchestration solution.
IBM Tivoli License Manager
IBM license management software offerings help companies achieve a total software asset management solution, enabling planning, management and optimization of enterprisewide software assets.
The IBM Tivoli Management Framework
The IBM Tivoli Management Framework is the foundation for a suite of management applications that are making systems and network management easy. This shields administrators from platformspecific details of daytoday operations. Common operations such as deploying applications and routine network maintenance can be performed with a single action; administrators are no longer required to repeat the same operation for each platform on your enterprise. Deploy applications to literally thousands of machines with one operation, all the while ensuring the applications remain available.
IBM Tivoli Monitoring for Virtual Servers
IBM Tivoli Monitoring for Virtual Servers centrally monitors server virtualization and consolidation resource performance and availability at the enterprise level for efficient and costeffective IT operations. IBM Tivoli Monitoring for Virtual
226 Introduction to Grid Computing
Servers allows for quick problem identification, notification, and correction, and provides tasks to automate and perform routine operations.
IBM Tivoli OMEGAMON XE Family
IBM Tivoli OMEGAMON XE for Distributed Systems offers a unique approach to enterprise management proactivity and advanced automation, which is especially important as IT structures become increasingly complex and heterogeneous. An integrated approach to management, Tivoli OMEGAMON XE for Distributed Systems enables you to see and manage your entire distributed enterprise from a single point of control.
IBM Tivoli Provisioning Manager
IBM Tivoli Provisioning Manager automates manual tasks of provisioning and configuring servers and virtual servers, operating systems, middleware, applications, storage, and network devices acting as routers, switches, firewalls, and load balancers.
IBM Tivoli System Automation for Multiplatforms
IBM Tivoli System Automation for Multiplatforms manages the availability of business applications and middleware running on single Linux and AIX systems or clusters on IBM zSeries, pSeries, iSeries, and xSeries, or other Intelbased servers, according to customerdefined goals.
IBM Tivoli Universal Agent
IBM Tivoli Universal Agent collects information via numerous data providers including SNMP, ODBC, and FILE to monitor almost any device or application connected to a TCPIP network. IBM Tivoli OMEGAMON solutions can then reveal consolidated views of performance and availability to help you diagnose and pinpoint problems more quickly.
Appendix A. IBM software portfolio for grid computing 227
WebSphere Application Server
The core of the WebSphere portfolio, this product is the industrys leading J2EE and Web services application server, delivering a highperformance and extremely scalable transaction engine for dynamic ebusiness applications.
WebSphere Application Server Network Deployment
WebSphere Application Server Network Deployment provides an operating environment with advanced performance and availability capabilities in support of dynamic application environments. In addition to all of the features and functions within the base WebSphere Application Server, this configuration delivers advanced deployment services that include clustering, edgeofnetwork services, Web services enhancements, and high availability for distributed configurations.
WebSphere Extended Deployment
WebSphere Extended Deployment, together with WebSphere Application Server Network Deployment, delivers a highperformance, easily manageable, and dynamically scalable environment for distributed WebSphere applications that leverages the principles and concepts of proven IBM systems. It provides:
WebSphere resource virtualization and pooling using node groups and dynamic clusters
Dynamic adjustment of WebSphere resources through application placement
Integration with Tivoli Intelligent Orchestrator optional, available separately
for enterprisewide autonomic provisioning
Introduction of operational policies to distributed WebSphere environments and intelligent routing and dynamic workload management according to established goals
IBM WebSphere MQ
IBM WebSphere MQ V6.0 delivers improved ease of use and manageability to provide a flexible and proven foundation for your enterprise service bus ESB.
228 Introduction to Grid Computing
WebSphere Studio Application Monitor
WebSphere Studio Application Monitor helps improve application availability and performance by providing deepdive realtime problem detection, analysis, and repair. Diagnostics at the method level can pinpoint code problems, which can help an architect or developer quickly fix a problem.
IBM Director
IBM Director is the industryleading clientserver workgroup manager. IBM Director tools provide customers with flexible capabilities to realize maximum system availability and lower IT costs. With IBM Director, IT administrators can view and track the hardware configuration of remote systems in detail and monitor the usage and performance of critical components, such as processors, disks, and memory.
IBM Remote Deployment Manager
Remote Deployment Manager RDM facilitates remote deployment of both IBM and nonIBM systems. RDM allows for remote unattended installation of new and existing systems. RDM helps automate deployment tasks such as initial operating system installation, BIOS updates, and disposal of retired systems. All of these tasks can be done without visiting the remote system, reducing travel and labor costs.
IBM ServerGuide
IBM ServerGuideTM is a tool that simplifies the process of installing and configuring IBM EserverTM and IBM Eserver xSeries servers. ServerGuide goes beyond mere hardware configuration by assisting with the automated installation of Windows server operating systems, device drivers, and other system components, with minimal user intervention.
IBM Virtual Machine Manager
IBM Virtual Machine Manager VMM is an extension to IBM Director that allows you to manage both physical and virtual machines from a single console. With VMM, you can manage both VMware ESX Server and Microsoft Virtual Server
Appendix A. IBM software portfolio for grid computing 229
environments using IBM Director. VMM also integrates VMware VirtualCenter and IBM Director for advanced virtual machine management.
Cluster Systems Management
Cluster Systems Management CSM for AIX and Linux is designed for simple, lowcost management of distributed and clustered IBM Eserver pSeries and xSeries servers in technical and commercial computing environments.
Parallel ESSL
Parallel ESSL is a scalable mathematical subroutine library that supports parallel processing applications on the IBM RS6000 SP Systems and clusters of IBM pSeries and RS6000 workstations.
LoadLeveler
LoadLeveler manages both serial and parallel jobs over a cluster of servers. This distributed environment consists of a pool of machines or servers, often referred to as a LoadLeveler cluster.
General Parallel File System
The IBM General Parallel File System GPFS is a highperformance shareddisk file system that can provide fast, reliable data access from all nodes in a homogenous or heterogeneous cluster of IBM UNIX servers running either the AIX 5LTM or the Linux operating system.
230 Introduction to Grid Computing
Appendix B.
Additional material
This redbook refers to additional material that can be downloaded from the Internet as described below.
Locating the Web material
The Web material associated with this redbook is available in softcopy on the Internet from the IBM Redbooks Web server. Point your Web browser to:
ftp:www.redbooks.ibm.comredbooksSG246778
Alternatively, you can go to the IBM Redbooks Web site at:
ibm.comredbooks
Select the Additional materials and open the directory that corresponds with
the redbook form number, SG246788.
Copyright IBM Corp. 2005. All rights reserved. 231
B
Using the Web material
The additional Web material that accompanies this redbook includes the following files:
File name
DemoGridApp.zip
GT4SampInst.zip
Description
A zip file including the source files and other supporting files required for the sample application described in Chapter 12, Demonstration application on page 197
A zip file containing a sample script that we used to quickly install Java, Ant, and Globus Toolkit 4.0
System requirements for downloading the Web material
The following system configuration is recommended:
Hard disk space 1 MB minimum.
Operating System Any OS supporting Java environment. Our examples are based on Linux.
How to use the Web material
GT4SampInst.zip
The content of this zip file is a sample bash shell script that we used to install and reinstall our grid nodes whenever we needed to rebuild our environment. It is customized for our environment and assumes specific host name and IP addresses for NFS shares that contain the installation images for Java, Ant, and Globus Toolkit 4.0. We do not provide any specific instructions on modifying this for your environment, but thought you might find it useful if you want to automate the install task once you have done it a few times manually. Chapter 11, Globus Toolkit 4 installation and configuration on page 155, provides the stepbystep instructions for manually installing an environment similar to ours.
Extract the shell script gt4install.sh from the GT4SampInst.zip file to an environment that supports the bash shell. Edit this file to meet your specific requirements.
DemoGridApp.zip
Create a subdirectory folder on your workstation, and unzip the contents of the Web material zip file into this folder. To use the files and application, follow the directions provided below in conjunction with detailed information about the application provided in Chapter 12, Demonstration application on page 197.
232 Introduction to Grid Computing
This application was developed and tested in the environment as described in Chapter 11, Globus Toolkit 4 installation and configuration on page 155.
Important: The demonstration grid application described in this book and available for download as described above utilizes the open source Batik toolkit available from Apache.org. See:
http:xml.apache.orgbatik
This toolkit provides libraries of functions to handle and manipulate SVG files. Before building and running our sample application, you should obtain the Batik toolkit from the Apache Web site referenced above. Specifically, the following jar files need to be available and specified in your CLASSPATH environment variable to compile and execute our sample application.
batikawtutil.jar batikbridge.jar batikcss.jar
batikdom.jar
batikext.jar
batikgvt.jar
batikparser.jar
batikrasterizer.jar batikscript.jar
batiksvgdom.jar
batiktranscoder.jar batikutil.jar
batikxml.jar
batik.jar
In addition, the Batik package includes sample SVG files that can be used to test and run this application. The two samples we use for testing are:
mapSpain.svg tiger.svg
Note that all scripts follow the installation paths shown earlier in the book. If you chose different install paths you will have to adjust the scripts.
To perform the source code build process
To do this:
1. Run the buildservice script. This must be done first to generate the services binding files required for the rest of the sample application, and build and package the RenderSourceService code. This must build cleanly to continue the process.
Appendix B. Additional material 233
2. Run the buildworker script. This builds and packages the RenderWorker code.
3. Run the buildclient script. This builds and packages the RenderClient application.
To perform the code deployment process
To do this:
1. Ensure the Globus container running the RenderSourceService is stopped. You can do this by typing C or closing the terminal window that the container is running in.
2. Run the undeployservice script. This will uninstall a previously installed version of the RenderSourceService from your local systems Globus container. This allows for a clean install of a new version of the service in the next step.
3. Run the deployservice script. This installs the previously built RenderSourceService into your local systems Globus container. This service should be installed into a Globus container on only one machine on the network. The client application defaults to this service running on the same machine as the client, so if you choose a different machine you will have to adjust the clients entry field to point to the proper machine.
To bring up the system for grid processing
To do this:
1. Run the startcontainer script in its own terminal window. You must do this for every node that you want to participate in the grid or host the RenderSourceService. This launches the Globus container, making it available for dispatch of RenderWorker processes, and on one machine makes the RenderSourceService ready for work. Globus will take some time to launch and will produce many status messages. These are the key longrunning process for the operation of the grid, so only shut down the container when you do not expect to submit any work to that particular node, or if you need to deploy a new version of the service.
2. Optionally, run the runworker script. This allows you to perform standalone testing of a RenderWorker process against a running RenderSourceService, without the overhead of the full GUI client.
3. Run the runclient script. This launches the GUI RenderClient application and lets you test the rendering system on your running grid.
.
234 Introduction to Grid Computing
Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.
IBM Redbooks
For information on ordering these publications, see How to get IBM Redbooks on page 238. Note that some of the documents referenced here may be available in softcopy only.
Introduction to Grid Computing with Globus, SG246895
Grid Computing: Solution Briefs, REDP3891
Grid Computing Products and Services, SG246650
Grid Computing in Research and Education, SG246649
Grid Computing with the IBM Grid Toolbox, SG246332
Grid Services Programming and Application Enablement, SG246100
Globus Toolkit 3.0 Quick Start, REDP3697
Enabling Applications for Grid Computing with Globus, SG246936
Fundamentals of Grid Computing, REDP3613
Other publications
These publications are also relevant as further information sources:
J. Joseph, M. Ernest, and C. Fellenstein, Evolution of grid computing architecture and grid adoption models, IBM Systems Journal Vol 43, No 4, 2004.
M. Baker, A. Apon, C. Ferner, and J. Brown, Emerging Grid Standards, page. 4350, IEEE Computer, April 2005.
I. Foster, J. Frey, S. Graham, S. Tuecke, K. Czajkowski, D. Ferguson, F. Leymann, M. Nally, I. Sedukhin, D. Snelling, T. Storey, W. Vambenepe, and S. Weerawarana, Modelling Stateful Resources with Web Services, http:www.ibm.comdeveloperworkslibrarywsresourcewsmodelingres ources.pdf, March 2004.
Copyright IBM Corp. 2005. All rights reserved. 235
K. Czajkowski, D. Ferguson, I. Foster, J. Frey, S. Graham, T. Maguire, D. Snelling, and S. Tuecke, From Open Grid Services Infrastructure to WSResource Framework: Refactoring and Evolution, http:www.ibm.comdeveloperworkslibrarywsresourceogsitowsrf1 .0.pdf, March 2004.
K. Czajkowski, D. Ferguson, I. Foster, J. Frey, S. Graham, I. Sedukhin, D. Snelling, S. Tuecke, and W. Vambenepe, The WSResource Framework, http:www.ibm.comdeveloperworkslibrarywsresourcewswsrf.pdf, March 2004.
S. Parastatidis, J. Webber, P. Watson, and T. Rischbeck, A Grid Application Framework based on Web Services Specifications and Practices, http:www.neresc.ac.ukwsgafA20Grid20Application20Framework20 based20on20Web20Services20Specifications20and20Practices20v1. 0.pdf, 2003.
I. Foster, A Globus Primer, http:www.globus.orgtoolkitdocs4.0keyGT4Primer0.6.pdf, May 2005.
I. Foster, C. Kesselman, The Grid: Blueprint for a new Computing Infrastructure, Morgan Kaufmann Publishers, San Francisco, CA, 1998.
I. Foster, C. Kesselman, and S. Tuecke, The Anatomy of the GridEnabling Scalable Virtual Organizations, The Globus Alliance, http:www.globus.orgresearchpapersanatomy.pdf.
I. Foster, H. Kishimoto, A. Savva, D. Berry, A. Djaoui, A. Grimshaw, B. Horn, F. Maciel, F. Siebenlist, R. Subramaniam, J. Treadwell and J. Von Reich, The Open Grid Services Architecture, Version 1.0, http:forge.gridforum.orgprojectsogsawg, January 2005.
S. Tuecke, K. Czajkowski, I. Foster, J. Frey, S. Graham, C. Kesselman, T. Maquire, T. Sandholm, D. Snelling, and P. Vanderbilt, Open Grid Services Infrastructure OGSI Version 1.0, Global Grid Forum, http:www.ggf.org, June 2003.
Open Grid Service Infrastructure Primer, Global Grid Forum, http:www.ggf.org, August, 2004.
N. Nagaratnam, P. Janson, J. Dayka, A. Nadalin, F. Siebenlist, V. Welch, I. Foster, S. Tuecke, Security Architecture for Open Grid Services, http:www.cs.virginia.eduhumphreyogsasecwgOGSASecArchv1071 92002.pdf.
236 Introduction to Grid Computing
Online resources
These Web sites and URLs are also relevant as further information sources:
Apache Ant Web page
http:ant.apache.org
Apache Batik http:xml.apache.orgbatik
Apache WSRF tutorial http:ws.apache.orgwsfxwsrftutorial
Distributed Management Task Force DMTF http:www.dmtf.org
Global Grid Forum GGF http:www.ggf.org
Globus http:www.globus.org
Globus Toolkit 4 Programmers Tutorial by Borja Sotomayor http:gdp.globus.orggt4tutorial
Globus WSRF http:www.globus.orgwsrf
GridFTP http:www.globus.orggridsoftwaredatagridftp.php
Internet Engineering Task Forfe http:www.ietf.org
Open Grid Services Architecture OGSA http:www.globus.orgogsa
OGSADAI http:www.ogsadai.org.uk
Open Grid Services Interface OGSI http:www.globus.orgtoolkitdraftggfogsigridservice3320030627.pdf
Organization for the Advancement of Structured Information Standards OASIS
http:www.oasisopen.org
Related publications 237
OASIS Web Services Resource Framework http:www.oasisopen.orgcommitteestchome.php?wgabbrevwsrf
pyGridWare http:dsd.lbl.govgtgprojectspyGridWare
Scalable Vector Graphics specification http:www.w3.orgTRSVG
Understanding WSRF Parts 1 to 4 by Babu Sundaram http:www.ibm.comdeveloperworks
Using Eclipse to develop Grid services http:www.ibm.comdeveloperworksedugrdwgreclipseidei.html
Web Services Activity http:www.w3.org2002ws
Web Services Interoerability http:www.wsi.org
Web Services Interoperability Organization WSI http:www.wsi.org
World Wide Web Consortium W3C http:www.w3.org
WSResource Framework Interop Workshop 1 Scenarios v0.13 http:www.ibm.comdeveloperworksoffersWSSpecworkshopswsrf200404.html
WSRF.NET Developer Tutorial by Mark Morgan and Glenn Wasson http:www.cs.virginia.edugsw2cWSRFdotNetWSRF.NETDeveloperTutorial.pd
f
How to get IBM Redbooks
You can search for, view, or download Redbooks, Redpapers, Hints and Tips, draft publications and Additional materials, as well as order hardcopy Redbooks or CDROMs, at this Web site:
ibm.comredbooks
238 Introduction to Grid Computing
Help from IBM
IBM Support and downloads
ibm.comsupport
IBM Global Services
ibm.comservices
Related publications 239
240 Introduction to Grid Computing
Index
A
availability 98
B
bandwidth 22
basic methodology 108 batch mode 52
benefits 7
biomedical 9
blueprint 96
business requirements 109
C
access control 68
Access Manager Family accounting 4
administrators perspective administrators 16
advanced synchronization aggregator framework 149 aggregator service 151
Altair PBS 152153
Andrew File System AFS 20 Apache Ant
archive 162
installation 162
Apache Software Foundation API
see Application Programming Interface application 23
authentication 68 characteristics 38 considerations 52 development 91, 97 distributed 59 integration 97, 106 monitoring 36
MPI 29
parallel 58, 101 processing 100 requirements 109 secure 88 submitting jobs 35
X.509 certificate Certificate Authority 159, 168169
37
73, 82, 87
27, 34, 40, 71, 74, 76, 146,
application considerations
Application Programming Interface 37 Application Workload Modeler 224 architectural design considerations 95 architecture 96
architecture models 101
assurance 66
asymmetric encryption 6970 asymmetric key pair 70 authentication 27, 34, 6465, 68, 77 authorization 6465, 77
autonomic computing 15, 26
implementation 92 primary responsibilities 40 public key 41
server 40, 71
Certification Practice Statement CPS 91 clients 20
Cloudscape 224
cluster 30
Cluster Systems Management CSM 230 coexist 45
communications 29
communications latencies 9
Community Scheduler Framework 4 CSF4 153 computation intensive applications 9 computational grid 8, 101, 105
Copyright IBM Corp. 2005. All rights reserved.
241
225 38
22
51, 53
224
C WS Core 144
CA Host 163, 169172 caching 36 calendaring system certificate 71, 76
well known example concepts 19
conceptual architecture central focus 98
confidentiality 6465 Configuration Manager contractual obligations CPU intensive applications
101
98, 108, 111
225 5
53
credential life span and renewal credentials 34, 40
65
Distributed Terascale Facility TeraGrid 102 DN
see Distinguished Name documentation 110
cryptography
D
68
daemons gatekeeper 81
data
confidentiality 68 configuration 36 considerations 59 dependencies 56, 59 federation 21
integrity 68
management components movement 36 redundancy 148
donor machine donor software donors 20 dynamic nature
E
27, 39 27
65
replication sharing 41 striping 21
99
Data Encryption Standard
data grid 8, 10, 20, 102, 105, 112113
Data Replication Service DRS DB2 Everyplace family 224 DB2 Universal DatabaseTM 224 DB2 ConnectTM family 224 deadlock detection 61 dedicated 25
degrees of isolation 5 delegation 65, 79 demonstration application 197 DES
48, 149
factory 121
failover scenarios 39
faults 48
federated databases
federation 21, 46
file system 20, 22, 3536, 5960, 124 financial modeling 9
firewall 8889, 98, 112, 164
firewall traversal 66
functional requirements 96
see Data Encryption Standard
design considerations 95
design workshops 110
digital certificate 69, 7173, 75, 77, 8283
graphical depiction 74
mutual authentication 76 directory and indexing service disk drive capacity 8 Distinguished Name DN 172
G
local user name 173
distributed applications 59
distributed computing paradigms
Distributed File System DFS 21 distributed grid management 26
Distributed Management Task Force DMTF
242 Introduction to Grid Computing
147
99
74, 76, 7879, 87, 159,
DAISWG 47
Global Grid Forum GGF
Globus Alliance 141
Globus container 165
Globus Teleoperations Control Protocol GTCP 154
Globus Toolkit 68, 7273, 76, 8182, 8990, 9293, 111112, 139, 144, 182
components 112 installation 158
5
46
Einstein 54
encryption 98
end point reference EPR engines 20
enterprise service bus ESB 228 EnterpriseTM Console 225 eutility 107
execution management 152 Extended Deployment 228 extragrid 103, 105106
F
General Parallel File System GPFS GIIS 112
Global Grid Forum
21, 230
100, 102
127, 175176
4647, 118, 122
obtaining 156 Public License 156 security components
Grid service 47, 67, 96, 104, 117, 130 basic set 96, 104
fundamental difference 118 reference 119
requirements 119
what is 117
Grid Service Handle 120, 133 Grid Service Reference 120, 133 grid types
computational grid 8, 101, 105
76 Globus Toolkit 4 63, 135, 141
Globus Toolkit 3 47, 122
binary packages container 198 source package
157
158 Globus Toolkit components
gatekeeper 81
GSI 68, 76 globusrunws command
file staging job 191 mulitple jobs 188 simple echo job 187
globusstartcontainer command globusurlcopy command 179 Third party transfer 179
GramJob object 204, 211 graphical user interface
Java application 197 graphical user interface GUI grid
management 26 performance 60
security model 67 security requirements 64 security terms 68 standards 45
topology 98 grid architecture 96 Grid Archive 135 grid computing 3
basicuses 8
benefits 7
grid design 64, 110 grid design steps 109 grid environment
execution management graphic depiction 111 performance objective
data grid 8, 10, 20, 102, 105, 112113
Grid Web Services Definition Language GWSDL 120
gridenabled device drivers 12
GridFTP 48, 147, 198
GridFTP Test 179
gridproxyinit command
179180
GRIS 112
GSI certificate
GSIOpenSSH
GT4 container
H
heterogeneous systems 15
heuristics 25
High Performance Computing HPC 53 highavailability routing protocols 99
user identity grid infrastructure grid job 12, 36 grid middleware grid resource 65
78 12
99
152 100
82
68, 76
IBM Director 229 IBM eServer
pSeries 230
IBM Java SDK 160161, 193
environmental variables 161
installation procedure 161
IBM Remote Deployment Manager 229 IBM ServerGuideTM 229
IBM Software
Development Platform 225 IBM software 223
IBM Tivoli 225227
Enterprise 225
Intelligent Orchestrater 226 Management Framework 226
72 secure communication
Grid Security Infrastructure
issue certificates
187188, 191
175, 177, 184
36, 52, 200, 209, 211
135136, 149150, 202
host certificate hosts 20 HTTP 65
I
170, 172, 174
163 146
81, 84, 87, 146, 176177,
Index 243
Monitoring 226 OMEGAMON solution 227 System Automation 227
Implied Resource Pattern 125 independently running parallel parts Index Service 149
service group entries information services 99 infrastructure requirements infrastructure security 88 input data 24, 3536, 59 installation considerations integration 64
56
L
LDAP replicas 100 License Manager 226 license managers 41 licenses 22
lifetime management 119 load sensor 29 Loadbalancing 99 LoadLeveler 230
local delegation 84 locks 61
logging onto the grid 34
M
manageability 66
management 1415, 22, 2627, 29, 37, 41, 46, 100, 108, 112
management components 26
management of priorities 15
massive parallel CPU capacity 9
Mathematical Acceleration Subsystem 224 MDS 112
MDS4 service 149
members 20
message integrity 66
message name 218
Message Passing Interface 29
metascheduler 28
middleware components 96
mirror 21, 99
monitoring 36
Monitoring and Discovery Services MDS 149150
motion picture animation 9
motivations 7
MPI
see Message Passing Interface
Intelligent Orchestrator 226 intergrid 3031, 106 interjob contention 9 interoperability 64 interoperate 45
intragrid 3031, 96, 103104
intrusion detection ISO 10181 67 ISO 74982 67
89, 98
J
J2EE container JAR file 216 Java WS Core
53
143 component 157158
container 174, 176, 182 environment 174
installation package
package 157 JAXRPC 123
job 23
job queue 24
job scheduling software job state 204, 213
157
28
job submission 35 JobDescriptionType object
incorrect definition 207 journal 21
JPEG file 197, 200, 211 JPEG image 211
mutual authentication function 79
process 82
N
naming and references
NEESgrid Teleoperations Control Protocol
WSRF version 154
NEESgrid Teleoperations Control Protocol NTCP 154
network communication capacities 9
K
Kerberos 84 key management
68
244 Introduction to Grid Computing
150 109
39
204, 206207, 210
69, 76, 82
119
Network Deployment 228 Network File System NFS
network IDS
Newton 54
nodes 20
nonfunctional requirements
20
96
see Public Key Infrastructure PKI environment 76, 146 planning 38
Platform LSF 152153
policies 16
policy exchange 66
policy requirements 10
policy violations 90
portType 121, 125126, 130, 132 postgresql service 181
prediction 29
pricing
resources 108 privacy 65
90
notification interfaces notifications 48, 119
120
ntp client ntp service
O
164 164
OASIS
offpeak usage times 60 OGSA 46, 66, 119 OGSA compliant grid
service 118119 OGSA Service Model 119 OGSADAI 47, 149
46, 48
OGSI model
oil exploration 9
OMEGAMON XE Family 227
online transaction processing OLTP
Open Grid Service Architecture OGSA 46 Open Grid Services Architecture OGSA 118
Open Grid Services Interface OGSI open standards 3, 5, 96
8081, 8485, 87, 176177, 80
121
organizational considerations OSI Security 67
P
38
Python WS Core
Q
quality of service
R
144
25, 37
paraellel applications parallel applications
parallel calculations
Parallel ESSL 230
parallel execution 9, 54 parallel processing 99 parallel transfers 48 parameter space 35 parameter space problems partial file transfers 48 patterns of attacks 90 perfectly scalable application performance 60, 100 performance gains 52 phases and activities 108 physical security practices PKI 90
58 101
53
Rational Application Developer for WebSphere Software 225
realtime requirements 9
recovery oriented computing 26
Redbooks Web site 238 Contact us xiii
redundancy 98
reference architecture 46
ReferenceProperties 129
Registrant Authority RA
registration and discovery
reliability 14
Reliable File Transfer RFT 148, 158, 174175, 177, 180182, 184185
remote communication 84
Remote Deployment Manager RDM 229
57
9, 20
88
72, 75 119
224
47, 119120
encryption system 40 Public Key Infrastructure
69, 72, 98
private key
91, 171
project groups 39 prototype 110 Provisioning Manager 227
3941, 6972, 75, 77, 8081, 83, 87,
proxy certificate
179180
proxy creation
proxy login 34
public key 3940, 6973, 7577, 80, 83, 87, 91
Index 245
remote machine 8, 81
RenderClient 197198, 200202, 204209, 211214, 217
RenderSourceService
216
RenderWorker
210214, 217
replica 21
Replica Location Service RLS
replication 46
requirement validation
requirements gathering
reservation 2425
reservation period
reservation system
reserved 25
reserving resourcs
resources
204, 211212, 215216
109 109
allocation 96 balancing 12
billing 108 communications 22 computation 20 discovery 46 exploiting 8 identifier 127130 lifetime 48 management 41 on demand 102 protected 97 reservation 25, 37 sharing 3
201, 203, 208, 210214, 197198, 201203, 206207,
4
software and licenses special equipment 23 storage 20
type of 11, 20, 23, 104105
underutilized
virtual 10, 15 rft command 184
8
RFT file transfer RFT error 183
RFT service 152, 202
runtime components
S
sandbox
scalability
Scalable Vector Graphics SVG
35, 96
9, 20, 25, 58
246 Introduction to Grid Computing
25 29
37
challenges 64
184 143
22
148
197, 200201,
scavenging
scheduler
scheduling
scheduling techniques 10 secondary storage 20
secure communication
secure data transfer 83
secure logging 66
Secure Shell SSH 84
Secure Socket Layer 69, 82 securing the OGSA infrastructure 66 security 4, 38, 63, 111
24, 27, 37, 102
13, 21, 23, 25, 2829, 37, 41
components domains 5 fundamentals 67 infrastructure 76 model 67
policy 90 requirements 64 service 99 stepbystep 84
security policy 6465, 225 feasible set 90
security requirements 64
server certificate 76
service creation 119
service group emtries 150
service oriented architecture SOA ServiceGroup 121, 131, 133 session key 83
SETIhome grid 101 shared data 60 SimpleCA 146
single logon 65
single system image 12 SOAP 49, 65
software clustering 99 software components 26 software platform 20, 45 software portfolio 223 solution design 96 solution objectives 97 source package 156157 SSL
see Secure Socket Layer SSL handshake 82
staging the input data 35
47, 117118
145
82
standards 45
startup processing 185
state management 117 stated requirement 97 stateful resource 48, 123129
Implied Resource Pattern 126 instance 128
instance management 127 management requirement 48
storage 20
storage resources 11 subimages 203, 210213 subjob 9, 12, 3637, 5960 submission clients 28 submission nodes 28
trust relationship 65
U
UDDI 49
uniform name space 21 uninterruptable power supply UPS 88 user certificate 75, 8485, 87, 171172 user id 27, 35, 39
common 35
user identity 78
user roles 33
utility computing 115
V
validate requirement 109
verification of the user 75
Virtual Machine Manager VMM 229
virtual organization 10, 64, 72, 115, 145, 150, 198, 201203, 210211
root node 202, 211 virtualization 3
virtualized resource 6, 5153
typical business applications 52 viruses 40
VO
see virtual organization VPN tunneling 31 vulnerabilities 90
W
W3C 46
Web material 231232
Web Service WS 5, 4647, 117118, 123124, 131, 228
Web Service Definition Language WSDL 47 Web Service deployment scenario 135
Web Service specifications 132
Web Services Interoperabilty WSI 49
Web Services Resource Framework WSRF 48, 119
Web Services technology 118
WebMDS 152
WebSphere Application Server 228
WebSphere MQ 228
WebSphere Software 225
IBM Rational Application Developer 225 Rational Application Developer 225
submission software submitting jobs 35 supports parallel SP SVG file 201, 211, 216
aspect ratio 201
symmetric key encryption synchronization 22, 60 synchronization contention 60 synchronization deadlock 61 synchronization primitives 6061 synchronization protocols 9
System Automation for Multiplatforms
systems management
T
terms 19
thirdparty transfers 48
Tivoli Management Framework Tivoli Monitoring for Virtual Servers Tivoli Universal Agent 227
TLS
see Transport Layer Security topics 48
topologies topology
30, 103
eutility 107
extragrid 103, 105 intergrid 106
intragrid 96, 103104
Total Cost of Ownership TCO transient grid services 119 Transport Layer Security 69 trojan horses 40
102
28 230
69
4, 112
227
226
226
Index 247
WebSphere Studio Application Monitor 229 worker node 204208, 210211, 215
Globus container 205
workload management 99100
Workspace Management Service WMS 154 World Community GridTM 102
WS Gram 143, 152
WSAddressing 129
WSDL 49, 123
WSDL 1.1
construct 123
WSDL 2.0 123
WSDL file 118, 125, 134
portType definition 126 WSDL Relationship Model 125 WSI 46
WSNotification family 124, 130
WSResource Framework WSRF 135
WSRF fundamentals 124 WSRF refactoring 122 WSRF service 150 WSRF specification 130
X
XML 49, 126
XML representation 128 XML version 216
xsd
125, 130132
element name
217218
248 Introduction to Grid Computing
Introduction to Grid Computing
Introduction to Grid Computing
Introduction to Grid Computing
Introduction to Grid Computing
Introduction to Grid Computing
Introduction to Grid Computing
Back cover
Introduction to Grid Computing
SG24677800 ISBN 0738494003
INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION
BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE
IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.
For more information: ibm.comredbooks
Learn grid computing basics
Understand architectural considerations
Create and demonstrate a grid environment
In the past several years, grid computing has emerged as a way to harness and take advantage of computing resources across geographies and organizations. In this IBM Redbook, we describe a generalized view of grid computing including concepts, standards, and ways in which grid computing can provide business value to your organization. In a nutshell, grid computing is all about virtualization that enables businesses to take advantage of a variety of IT resources in order to be more responsive to demands of the business and increase availability of applications while reducing both infrastructure and management costs.
There are many products and toolkits available from IBM and other companies that enable different aspects of grid computing. One of the most well known toolkits is the Globus Toolkit. Globus Toolkit 4 provides components and services conforming to existing and evolving standards that can be used as the basis for a grid computing solution. In the second half of this book we provide instructions for installing and configuring a simple Globus environment that can be used to demonstrate various aspects of grid computing and to build a proof of concept environment. We also describe, and provide as additional material, a sample grid application that can be used to demonstrate, test, and teach more about the grid computing concepts introduced in this book.