SlideShare a Scribd company logo
GRID COMPUTING
Sandeep Kumar Poonia
Head Of Dept. CS/IT
B.E., M.Tech., UGC-NET
LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
SandeepKumarPoonia
Implementing production Grids
The Globus package was chosen for several reasons:
• A clear, strong, and standards-based security model,
• Modular functions (not an all-or-nothing approach)
providing all the Grid Common Services, except general
events,
• A clear model for maintaining local control of resources
that are incorporated into a Globus Grid,
• A general design approach that allows a decentralized
control and deployment of the software,
• A demonstrated ability to accomplish large-scale
Metacomputing,
• Presence in supercomputing environments,
• A clear commitment to open source, and
• Today, one would also have to add ‘market share’.
SandeepKumarPoonia
‘Grids’ are an approach for building dynamically constructed
problem-solving environments using
geographically and organizationally dispersed,
high-performance computing and
data handling resources.
Grids also provide important infrastructure supporting multi-
institutional collaboration.
THE GRID CONTEXT
SandeepKumarPoonia
Functionally, Grids are tools, middleware, and services
for
• building the application frameworks that allow
disciplined scientists to express and manage the
simulation, analysis, and data management aspects of
overall problem solving,
• providing a uniform and secure access to a wide variety
of distributed computing and data resources,
SandeepKumarPoonia
• supporting construction, management, and use of
widely distributed application systems,
• facilitating human collaboration through common
security services, and resource and data sharing,
• providing support for remote access to, and operation
of, scientific and engineering instrumentation systems,
and
• managing and operating this computing and data
infrastructure as a persistent service.
SandeepKumarPoonia
This is accomplished through two aspects:
(1) a set of uniform software services that manage
and provide access to heterogeneous, distributed
resources and
(2) a widely deployed infrastructure.
SandeepKumarPoonia
SandeepKumarPoonia
THE ANTICIPATED GRID USAGE
MODEL WILL DETERMINE WHAT GETS
DEPLOYED, AND WHEN
•Grid computing models
•Grid data models
SandeepKumarPoonia
There are a number of identifiable computing models in
Grids that range from single resource to tightly coupled
resources, and each requires some variations in Grid
services.
Grid computing models
1. Export existing services
2. Loosely coupled processes
3. Workflow managed processes
4. Distributed-pipelined/coupled processes
5. Tightly coupled processes
SandeepKumarPoonia
1. Grids provide a uniform set of services to export the capabilities
of existing computing facilities such as supercomputer centers to
existing user communities, and this is accomplished by the
Globus software.
2. The primary advantage of this form of Grids is to provide a
uniform view of several related computing systems, or to prepare
for other types of uses.
3. This sort of Grid also facilitates/encourages the incorporation of
the supercomputers into user constructed systems (various sorts
of portals or frameworks that run on user systems and provide
for creating and managing related suites of Grid jobs).
Export existing services
SandeepKumarPoonia
1. By loosely coupled processes we mean collections of logically
related jobs that nevertheless do not have much in common once
they are executing.
2. That is, these jobs are given some input data that might, for
example, be a small piece of a single large dataset, and they
generate some output data that may have to be integrated with
the output of other such jobs; however, their execution is largely
independent of the other jobs in the collection.
Two common types of such jobs are
1. data analysis, in which a large dataset is divided into units that
can be analyzed independently, and
2. parameter studies, where a design space of many parameters is
explored, usually at low model resolution, across many different
parameter values
Loosely coupled processes
SandeepKumarPoonia
Most workflow managers manage events of all sorts. By ‘event’, we mean
essentially any asynchronous message that is used for decision-making
purposes. Typical Grid events include
1. normal application occurrences that are used, for example, to trigger
computational steering or semi-interactive graphical analysis,
2. abnormal application occurrences, such as numerical convergence
failure, that are used to trigger corrective action,
3. messages that certain data files have been written and closed so that
they may be used in some other processing step.
Workflow managed processes
SandeepKumarPoonia
In application systems that involve multidisciplinary or other
multicomponent simulations, it is very likely that the processes will
need to be executed in a ‘pipeline’ fashion.
That is, there will be a set of interdependent processes that
communicate data back and forth throughout the entire execution of
each process.
In this case, co-scheduling is likely to be essential, as is good
network bandwidth between the computing systems involved.
Co-scheduling for the Grid involves scheduling multiple individual,
potentially architecturally and administratively heterogeneous
computing resources so that multiple processes are guaranteed to
execute at the same time in order that they may communicate and
coordinate with each other.
Distributed-pipelined/coupled processes
SandeepKumarPoonia
MPI and Parallel Virtual Machine (PVM) support a distributed memory
programming model.
MPICH-G2 (the Globus-enabled MPI) provides for MPI style
interprocess communication between Grid computing resources. It
handles data conversion, communication establishment, and so on.
Co-scheduling is essential for this to be a generally useful capability
since different ‘parts’ of the same program are running on different
systems.
PVM is another distributed memory programming system that can be
used in conjunction with Condor and Globus to provide Grid
functionality for running tightly coupled processes.
Tightly coupled processes
SandeepKumarPoonia
Many of the current production Grids are focused around communities
whose interest in wide-area data management is at least as great as
their interest in Grid-based computing.
These include, for example, Particle Physics Data Grid (PPDG), Grid
Physics Network (GriPhyN), and the European Union DataGrid.
Like computing, there are several styles of data management in Grids,
and these styles result in different requirements for the software of a
Grid.
Grid data models
SandeepKumarPoonia
•Data mining, can require access to metadata and uniform access to
multiple data archives.
•SRB/MCAT provides capabilities that include uniform remote access
to data and local caching of the data for fast and/or multiple
accesses.
•Through its metadata catalogue, SRB provides the ability to
federate multiple tertiary storage systems.
•SRB provides a uniform interface by placing a server in front of (or
as part of) the tertiary storage system.
•This server must directly access the tertiary storage system, so
there are several variations depending on the particular storage
system.
Occasional access to multiple tertiary storage systems
SandeepKumarPoonia
In many scientific disciplines, a large community of users requires
remote access to large datasets. An effective technique for
improving access speeds and reducing network loads can be to
replicate frequently accessed datasets at locations chosen to be
‘near’ the eventual users.
However, organizing such replication so that it is both reliable and
efficient can be a challenging problem, for a variety of reasons.
The datasets to be moved can be large, so issues of network
performance and fault tolerance become important.
Distributed analysis of massive datasets
followed by cataloguing and archiving
SandeepKumarPoonia
The data-intensive science applications noted above that are
international in their scope have motivated the GridFTP
emphasis on providing WAN high performance and the ability to
manage huge files in the wide area. To accomplish this, GridFTP
provides
• integrated GSI security and policy-based access control,
• third-party transfers (between GridFTP servers),
• wide-area network communication parameter optimization,
• partial file access, • reliability/restart for large file transfers,
• integrated performance monitoring instrumentation,
• network parallel transfer streams,
• server-side data striping and HPSS striped tapes),
• server-side computation,
• proxies (to address firewall and load-balancing).
SandeepKumarPoonia
A common situation is that a whole set of simulations or data analysis
programs will require the use of the same large reference dataset.
The management of such datasets, the originals of which almost
always live in a tertiary storage system, could be handled by one of
the replica managers.
However, another service that is needed in this situation is a network
cache: a unit of storage that can be accessed and allocated as a Grid
resource, and that is located ‘close to’ (in the network sense) the Grid
computational resources that will run the codes that use the data. The
Distributed Parallel Storage System (DPSS) can provide this
functionality; however, it is not currently well integrated with Globus.
Large reference data sets
SandeepKumarPoonia
The Metadata Catalogue of SRB/MCAT provides a powerful
mechanism for managing all types of descriptive information about
data: data content information, fine-grained access control, physical
storage device (which provides location independence for federating
archives), and so on.
The flip side of this is that the service is fairly heavyweight to use
(when its full capabilities are desired) and it requires considerable
operational support.
Grid metadata management
SandeepKumarPoonia
Currently, Grids support collaboration, in the form of Virtual
Organizations (VO) (by which we mean human collaborators,
together with the Grid environment that they share), in two very
important ways.
GRID SUPPORT FOR COLLABORATION
The GSI provides a common authentication approach that is a basic and
essential aspect of collaboration. It provides the authentication and
communication mechanisms, and trust management that allow groups of
remote collaborators to interact with each other in a trusted fashion, and it
is the basis of policy-based sharing of collaboration resources. GSI has the
added advantage that it has been integrated with a number of tools that
support collaboration, for example, secure remote login and remote shell –
GSISSH, and secure ftp – GSIFTP, and GridFTP.
SandeepKumarPoonia
The second important contribution of Grids is that of
supporting collaborations that are VO and as such have to
provide ways to preserve and share the organizational and
share community information (e.g. the location and
description of key data repositories, code repositories, etc.).
For this to be effective over the long term, there must be a
persistent publication service where this information may be
deposited and accessed by both humans and systems. The
GIS can provide this service.
SandeepKumarPoonia
A third Grid collaboration service is the Access Grid (AG) – a
group-to-group audio and videoconferencing facility that is
based on Internet IP multicast, and it can be managed by an
out-of-band floor control service.
The AG is currently being integrated with the Globus directory
and security services.
SandeepKumarPoonia
BUILDING AN INITIAL MULTISITE,
COMPUTATIONAL AND DATA GRID
1. The Grid building team
The successful Grid involve almost as much sociology as
technology, and therefore establishing good working
relationships among all the people involved is essential.
SandeepKumarPoonia
2. Grid resources
As early as possible in the process, identify the computing and
storage resources to be incorporated into your Grid.
In doing this be sensitive to the fact that opening up systems to Grid
users may turn lightly or moderately loaded systems into heavily
loaded systems.
Batch schedulers may have to be installed on systems that previously
did not use them in order to manage the increased load.
BUILDING AN INITIAL MULTISITE,
COMPUTATIONAL AND DATA GRID
SandeepKumarPoonia
BUILDING AN INITIAL MULTISITE,
COMPUTATIONAL AND DATA GRID
3. Build the initial test bed
Grid information service
The Grid Information Service provides for locating resources based on
the characteristics needed by a job (OS, CPU count, memory, etc.).
The Globus MDS provides this capability with two components.
The Grid Resource Information Service (GRIS) runs on the Grid
resources (computing and data systems) and handles the soft-state
registration of the resource characteristics.
The Grid Information Index Server (GIIS) is a user accessible
directory server that supports searching for resource by
characteristics. Other information may also be stored in the GIIS, and
the GGF, Grid Information Services group is defining schema for
SandeepKumarPoonia
BUILDING AN INITIAL MULTISITE,
COMPUTATIONAL AND DATA GRID
Build Globus on test systems
Use PKI authentication and initially use certificates from the Globus
Certificate Authority (‘CA’) or any other CA that will issue you
certificates for this test environment. (The OpenSSL CA may be
used for this testing.)
Then validate access to, and operation of the, GIS/GIISs at all sites
and test local and remote job submission using these certificates.
SandeepKumarPoonia
CROSS-SITE TRUST MANAGEMENT
One of the most important contributions of Grids
to supporting large-scale collaboration is the
uniform Grid entity naming and authentication
mechanisms provided by the GSI.
SandeepKumarPoonia
Trust
Trust is ‘confidence in or reliance on some quality or attribute of a
person or thing, or the truth of a statement’.
Cyberspace trust starts with clear, transparent, negotiated, and
documented policies associated with identity.
When a Grid identity token (X.509 certificate in the current context)
is presented for remote authentication and is verified using the
appropriate cryptographic techniques, then the relying party should
have some level of confidence that the person or entity that initiated
the transaction is the person or entity that it is expected to be.
CROSS-SITE TRUST MANAGEMENT
SandeepKumarPoonia
CROSS-SITE TRUST MANAGEMENT
It is difficult to establish trust for large, heterogeneous VOs
involving people from multiple, international institutions, because
the shared trust models do not exist. The typical issues related to
establishing trust may be summarized as follows:
•Across administratively similar systems
•for example, within an organization
•informal/existing trust model can be extended toGrid authentication
and authorization
•Administratively diverse systems
•for example, across many similar organizations.
•formal/existing trust model can be extended to Grid authentication
and authorization
•Administratively heterogeneous
•for example, cross multiple organizational types (e.g. science labs
and industry),
•for example, international collaborations
•formal/new trust model for Grid authentication and authorization will
need to be developed.
SandeepKumarPoonia
CROSS-SITE TRUST MANAGEMENT
Establishing an operational CA3
Set up, or identify, a Certification Authority to issue Grid X.509
identity certificates to users and hosts.
Both the IPG and DOE Science Grids use the Netscape CMS
software for their operational CA because it is a mature
product that allows a very scalable usage model that matches
well with the needs of science VO.
SandeepKumarPoonia
CROSS-SITE TRUST MANAGEMENT
Naming
One of the important issues in developing a CP is the naming of the
principals (‘subject,’ i.e. the Grid entity identified by the certificate).
While there is an almost universal tendency to try and pack a lot of
information into the subject name (which is a multicomponent, X.500 style
name), increasingly there is an understanding that the less information of
any kind put into a certificate, the better.
This simplifies certificate management and re-issuance when users forget
pass phrases (which will happen with some frequency).
SandeepKumarPoonia
The certification authority model
There are several models for CAs; however,
increasingly associated groups of collaborations/
VO are opting to find a single CA provider. The
primary reason for this is that it is a formal and
expensive process to operate a CA in such a way
that it will be trusted by others
SandeepKumarPoonia
Certificate issuing process
SandeepKumarPoonia
1 First steps
Issue host certificates for all the computing and data resources and
establish procedures for installing them. Issue user certificates.
Count on revoking and re-issuing all the certificates at least once
before going operational.
This is inevitable if you have not previously operated a CA.
Using certificates issued by your CA, validate correct operation of the
GSI , GSS libraries, GSISSH, and GSIFTP and/or GridFTP at all sites.
Start training a Grid application support team on this prototype.
TRANSITION TO A PROTOTYPE
PRODUCTION GRID
SandeepKumarPoonia
The ‘boundaries’ of a Grid are primarily determined by three factors:
• Interoperability of the Grid software: Many Grid sites run some variation
of the Globus software, and there is fairly good interoperability between
versions of Globus, so most Globus sites can potentially interoperate.
• What CAs you trust : This is explicitly configured in each Globus
environment on a per CA basis.
• How you scope the searching of the GIS/GIISs or control the information
that is published in them: This depends on the model that you choose for
structuring your directory services.
2. Defining/understanding the extent of ‘your’ Grid
SandeepKumarPoonia
Directory servers above the local GIISs (resource
information servers) are an important scaling mechanism
for several reasons.
There are currently two main approaches that are being
used for building directory services above the local GIISs.
One is a hierarchically structured set of directory servers
and a managed namespace, al la X.500, and
the other is ‘index’ servers that provide ad hoc, or ‘VO’
specific, views of a specific set of other servers, such as a
collection of GIISs, data collections, and so on.
3. The model for the Grid Information System
SandeepKumarPoonia
An X.500 style hierarchical name
component space directory structure
Using an X.500 Style hierarchical name component
space directory structure has the advantage of
organizationally meaningful names that represent a set
of ‘natural’ boundaries for scoping searches, and it also
means that you can potentially use commercial
metadirectory servers for better scaling.
SandeepKumarPoonia
Using the Globus MDS for the information directory hierarchy has
several advantages.
The MDS research and development work has added to the usual
Lightweight Directory Access Protocol (LDAP)–based directory service
capabilities several features that are important for Grids.
Characteristics of MDS include the following:
• Resources are typically named using the components of their Domain
Name System (DNS) name.
• One must use separate ‘index’ servers to define different relationships
among GIISs, virtual organization, data collections, and so on.
• Hierarchical GIISs (index nodes) are emerging as the preferred
approach in the Grids community that uses the Globus software.
Index server directory structure
SandeepKumarPoonia
As of yet, there is no standard authorization mechanism for Grids.
Almost all current Grid software uses some form of access control lists
(‘ACL’), which is straightforward, but typically does not scale very well.
The Globus mapfile is an ACL that maps from Grid identities to local
user identification numbers (UIDs) on the systems where jobs are to
be run.
The Globus Gatekeeper replaces the usual login authorization
mechanism for Grid-based access and uses the mapfile to authorize
access to resources after authentication.
Therefore, managing the contents of the mapfile is the basic Globus
user authorization mechanism for the local resource.
4. Local authorization
SandeepKumarPoonia
Incorporating any computing resource into a distributed application
system via Grid services involves using a whole collection of IP
communication ports that are otherwise not used.
If your systems are behind a firewall, then these ports are almost
certainly blocked, and you will have to negotiate with the site security
folks to open the required ports.
Globus can be configured to use a restricted range of ports, but it still
needs several tens, or so, in the mid-700s. (The number depending on
the level of usage of the resources behind the firewall.)
5. Site security issues
SandeepKumarPoonia
A Globus ‘port catalogue’ is available to tell what each Globus port is used
for, and this lets you provide information that your site security folks will
probably want to know.
It will also let you estimate how many ports have to be opened (how many
per process, per resource, etc.).
Additionally, GIS/GIIS needs some ports open, and the CA typically uses a
secure Web interface (port 443).
SandeepKumarPoonia
If you anticipate high data-rate distributed applications, whether for large-
scale data movement or process-to-process communication,
then enlist the help of a WAN networking specialist and check and refine the
network bandwidth end-to-end using large packet size test data streams.
Problems are likely between application host and site LAN/WAN gateways,
WAN/WAN gateways, and along any path that traverses the commodity
Internet.
Considerable experience exists in the DOE Science Grid in detecting and
correcting these types of problems, both in the areas of diagnostics and
tuning.
6. High performance communications issues
SandeepKumarPoonia
There are several functions that are important to Grids that Grid
middleware cannot emulate: these must be provided by the resources
themselves.
Some of the most important of these are the functions associated with
job initiation and management on the remote computing resources.
Development of the PBS batch scheduling system was an active part of
the IPG project, and several important features were added in order to
support Grids.
7. Batch schedulers
SandeepKumarPoonia
Try and find problems before your users do.
Design test and validation suites that exercise your Grid in the
same way that applications are likely to use your Grid.
As early as possible in the construction of your Grid, identify some
test case distributed applications that require reasonable
bandwidth and run them across as many widely separated systems
in your Grid as possible, and then run these test cases every time
something changes in your configuration.
8. Preparing for users
SandeepKumarPoonia
At this point, Globus, the GIS/MDS, and the security infrastructure
should all be operational on the test bed system(s).
The Globus deployment team should be familiar with the install and
operation issues and the system admins of the target resources should
be engaged.
Deploy and build Globus on at least two production computing
platforms at two different sites.
Establish the relationship between Globus job submission and the local
batch schedulers (one queue, several queues, a Globus queue, etc.).
Validate operation of this configuration.
9. Moving from test bed to prototype production Grid
SandeepKumarPoonia
Grids present special challenges for system administration owing to the
administratively heterogeneous nature of the underlying resources.
In the DOE Science Grid, we have built Grid monitoring tools from Grid
services.
We have developed pyGlobus modules for the NetSaint system
monitoring framework that test GSIFTP, MDS and the Globus
gatekeeper.
We have plans for, but have not yet implemented, a GUI tool that will
use these modules to allow an admin to quickly test functionality of a
particular host.
10 Grid systems administration tools
SandeepKumarPoonia
Establish the model for moving data between all the systems involved
in your Grid.
GridFTP servers should be deployed on the Grid computing platforms
and on the Grid data storage platforms.
This presents special difficulties when data resides on user systems
that are not usually Grid resources and raises the general issue of your
Grid ‘service model’: what services are necessary to support in order to
achieve a Grid that is useful for applications but are outside your core
Grid resources (e.g. GridFTP on user data systems) and how you will
support these services are issues that have to be recognized and
addressed.
11 Data management and your Grid service model
SandeepKumarPoonia

More Related Content

What's hot (19)

PPT
Grid computing & its applications
Alokeparna Choudhury
 
PDF
Session19 Globus
ISSGC Summer School
 
DOC
Gcc notes unit 1
haritha madala
 
PPT
Computing Outside The Box
Ian Foster
 
DOC
Grid computing 12
Dhamu Harker
 
PDF
Data Distribution Handling on Cloud for Deployment of Big Data
ijccsa
 
PDF
Grid computing notes
Syed Mustafa
 
PPT
Grid Computing - Collection of computer resources from multiple locations
Dibyadip Das
 
PDF
CONTENT BASED DATA TRANSFER MECHANISM FOR EFFICIENT BULK DATA TRANSFER IN GRI...
ijgca
 
PPTX
Grid computing
shweta-sharma99
 
PPTX
Cs6703 grid and cloud computing unit 2
RMK ENGINEERING COLLEGE, CHENNAI
 
PPT
Inroduction to grid computing by gargi shankar verma
gargishankar1981
 
PPT
Grid Computing
Senthil Kumar
 
PDF
Grid Computing Frameworks
Sabbir Ahmmed
 
PDF
Iaetsd secured and efficient data scheduling of intermediate data sets
Iaetsd Iaetsd
 
PPTX
Grid computing ppt
richa chaudhary
 
PPTX
Applications of SOA and Web Services in Grid Computing
yht4ever
 
PPTX
Grid Computing (An Up-Coming Technology)
LJ PROJECTS
 
PDF
Survey on Division and Replication of Data in Cloud for Optimal Performance a...
IJSRD
 
Grid computing & its applications
Alokeparna Choudhury
 
Session19 Globus
ISSGC Summer School
 
Gcc notes unit 1
haritha madala
 
Computing Outside The Box
Ian Foster
 
Grid computing 12
Dhamu Harker
 
Data Distribution Handling on Cloud for Deployment of Big Data
ijccsa
 
Grid computing notes
Syed Mustafa
 
Grid Computing - Collection of computer resources from multiple locations
Dibyadip Das
 
CONTENT BASED DATA TRANSFER MECHANISM FOR EFFICIENT BULK DATA TRANSFER IN GRI...
ijgca
 
Grid computing
shweta-sharma99
 
Cs6703 grid and cloud computing unit 2
RMK ENGINEERING COLLEGE, CHENNAI
 
Inroduction to grid computing by gargi shankar verma
gargishankar1981
 
Grid Computing
Senthil Kumar
 
Grid Computing Frameworks
Sabbir Ahmmed
 
Iaetsd secured and efficient data scheduling of intermediate data sets
Iaetsd Iaetsd
 
Grid computing ppt
richa chaudhary
 
Applications of SOA and Web Services in Grid Computing
yht4ever
 
Grid Computing (An Up-Coming Technology)
LJ PROJECTS
 
Survey on Division and Replication of Data in Cloud for Optimal Performance a...
IJSRD
 

Similar to 5. the grid implementing production grid (20)

PPT
GridComputing-an introduction.ppt
NileshkuGiri
 
PPT
All about GridComputing-an introduction (2).ppt
lagoki2767
 
PPT
Grid computing
Keshab Nath
 
PPT
Grid Presentation
Marielisa Peralta
 
PDF
Dq36708711
IJERA Editor
 
PPT
Gridcomputingppt
navjasser
 
PPTX
Grid Computing
abhiritva
 
DOCX
Rep on grid computing
shweta-sharma99
 
PPT
Grid computing by vaishali sahare [katkar]
vaishalisahare123
 
PPTX
Unit i introduction to grid computing
sudha kar
 
PPTX
Grid computing
Ramraj Choudhary
 
PDF
1. GRID COMPUTING
Dr Sandeep Kumar Poonia
 
PPT
GRID COMPUTING.ppt
4173CarreonIraMaeL
 
PPTX
Grid computiing
Aamir chouhan
 
PDF
An efficient scheduling policy for load balancing model for computational gri...
Alexander Decker
 
PDF
3. the grid new infrastructure
Dr Sandeep Kumar Poonia
 
PPTX
Introduction to Grid Computing
abhijeetnawal
 
PPT
Grid and cluster_computing_chapter1
Bharath Kumar
 
PPT
Grid Computing in a Commodity World (KCCMG, 2005)
Lorin Olsen
 
GridComputing-an introduction.ppt
NileshkuGiri
 
All about GridComputing-an introduction (2).ppt
lagoki2767
 
Grid computing
Keshab Nath
 
Grid Presentation
Marielisa Peralta
 
Dq36708711
IJERA Editor
 
Gridcomputingppt
navjasser
 
Grid Computing
abhiritva
 
Rep on grid computing
shweta-sharma99
 
Grid computing by vaishali sahare [katkar]
vaishalisahare123
 
Unit i introduction to grid computing
sudha kar
 
Grid computing
Ramraj Choudhary
 
1. GRID COMPUTING
Dr Sandeep Kumar Poonia
 
GRID COMPUTING.ppt
4173CarreonIraMaeL
 
Grid computiing
Aamir chouhan
 
An efficient scheduling policy for load balancing model for computational gri...
Alexander Decker
 
3. the grid new infrastructure
Dr Sandeep Kumar Poonia
 
Introduction to Grid Computing
abhijeetnawal
 
Grid and cluster_computing_chapter1
Bharath Kumar
 
Grid Computing in a Commodity World (KCCMG, 2005)
Lorin Olsen
 
Ad

More from Dr Sandeep Kumar Poonia (20)

PDF
Soft computing
Dr Sandeep Kumar Poonia
 
PDF
An improved memetic search in artificial bee colony algorithm
Dr Sandeep Kumar Poonia
 
PDF
Modified position update in spider monkey optimization algorithm
Dr Sandeep Kumar Poonia
 
PDF
Enhanced local search in artificial bee colony algorithm
Dr Sandeep Kumar Poonia
 
PDF
Memetic search in differential evolution algorithm
Dr Sandeep Kumar Poonia
 
PDF
Improved onlooker bee phase in artificial bee colony algorithm
Dr Sandeep Kumar Poonia
 
PDF
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Dr Sandeep Kumar Poonia
 
PDF
A novel hybrid crossover based abc algorithm
Dr Sandeep Kumar Poonia
 
PDF
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
Dr Sandeep Kumar Poonia
 
PDF
Sunzip user tool for data reduction using huffman algorithm
Dr Sandeep Kumar Poonia
 
PDF
New Local Search Strategy in Artificial Bee Colony Algorithm
Dr Sandeep Kumar Poonia
 
PDF
A new approach of program slicing
Dr Sandeep Kumar Poonia
 
PDF
Performance evaluation of different routing protocols in wsn using different ...
Dr Sandeep Kumar Poonia
 
PDF
Enhanced abc algo for tsp
Dr Sandeep Kumar Poonia
 
PDF
Database aggregation using metadata
Dr Sandeep Kumar Poonia
 
PDF
Performance evaluation of diff routing protocols in wsn using difft network p...
Dr Sandeep Kumar Poonia
 
PDF
Lecture28 tsp
Dr Sandeep Kumar Poonia
 
PDF
Lecture27 linear programming
Dr Sandeep Kumar Poonia
 
Soft computing
Dr Sandeep Kumar Poonia
 
An improved memetic search in artificial bee colony algorithm
Dr Sandeep Kumar Poonia
 
Modified position update in spider monkey optimization algorithm
Dr Sandeep Kumar Poonia
 
Enhanced local search in artificial bee colony algorithm
Dr Sandeep Kumar Poonia
 
Memetic search in differential evolution algorithm
Dr Sandeep Kumar Poonia
 
Improved onlooker bee phase in artificial bee colony algorithm
Dr Sandeep Kumar Poonia
 
Comparative study of_hybrids_of_artificial_bee_colony_algorithm
Dr Sandeep Kumar Poonia
 
A novel hybrid crossover based abc algorithm
Dr Sandeep Kumar Poonia
 
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
Dr Sandeep Kumar Poonia
 
Sunzip user tool for data reduction using huffman algorithm
Dr Sandeep Kumar Poonia
 
New Local Search Strategy in Artificial Bee Colony Algorithm
Dr Sandeep Kumar Poonia
 
A new approach of program slicing
Dr Sandeep Kumar Poonia
 
Performance evaluation of different routing protocols in wsn using different ...
Dr Sandeep Kumar Poonia
 
Enhanced abc algo for tsp
Dr Sandeep Kumar Poonia
 
Database aggregation using metadata
Dr Sandeep Kumar Poonia
 
Performance evaluation of diff routing protocols in wsn using difft network p...
Dr Sandeep Kumar Poonia
 
Lecture27 linear programming
Dr Sandeep Kumar Poonia
 
Ad

Recently uploaded (20)

PPTX
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PPTX
The Future of Artificial Intelligence Opportunities and Risks Ahead
vaghelajayendra784
 
PPTX
Translation_ Definition, Scope & Historical Development.pptx
DhatriParmar
 
PPTX
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
PPTX
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
PPTX
I INCLUDED THIS TOPIC IS INTELLIGENCE DEFINITION, MEANING, INDIVIDUAL DIFFERE...
parmarjuli1412
 
PPTX
Top 10 AI Tools, Like ChatGPT. You Must Learn In 2025
Digilearnings
 
PPTX
ENGLISH 8 WEEK 3 Q1 - Analyzing the linguistic, historical, andor biographica...
OliverOllet
 
PPTX
Cybersecurity: How to Protect your Digital World from Hackers
vaidikpanda4
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PPTX
Electrophysiology_of_Heart. Electrophysiology studies in Cardiovascular syste...
Rajshri Ghogare
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
PPTX
K-Circle-Weekly-Quiz12121212-May2025.pptx
Pankaj Rodey
 
PDF
Tips for Writing the Research Title with Examples
Thelma Villaflores
 
PPTX
Virus sequence retrieval from NCBI database
yamunaK13
 
PPTX
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
The Future of Artificial Intelligence Opportunities and Risks Ahead
vaghelajayendra784
 
Translation_ Definition, Scope & Historical Development.pptx
DhatriParmar
 
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
I INCLUDED THIS TOPIC IS INTELLIGENCE DEFINITION, MEANING, INDIVIDUAL DIFFERE...
parmarjuli1412
 
Top 10 AI Tools, Like ChatGPT. You Must Learn In 2025
Digilearnings
 
ENGLISH 8 WEEK 3 Q1 - Analyzing the linguistic, historical, andor biographica...
OliverOllet
 
Cybersecurity: How to Protect your Digital World from Hackers
vaidikpanda4
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
Electrophysiology_of_Heart. Electrophysiology studies in Cardiovascular syste...
Rajshri Ghogare
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
K-Circle-Weekly-Quiz12121212-May2025.pptx
Pankaj Rodey
 
Tips for Writing the Research Title with Examples
Thelma Villaflores
 
Virus sequence retrieval from NCBI database
yamunaK13
 
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 

5. the grid implementing production grid

  • 1. GRID COMPUTING Sandeep Kumar Poonia Head Of Dept. CS/IT B.E., M.Tech., UGC-NET LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
  • 2. SandeepKumarPoonia Implementing production Grids The Globus package was chosen for several reasons: • A clear, strong, and standards-based security model, • Modular functions (not an all-or-nothing approach) providing all the Grid Common Services, except general events, • A clear model for maintaining local control of resources that are incorporated into a Globus Grid, • A general design approach that allows a decentralized control and deployment of the software, • A demonstrated ability to accomplish large-scale Metacomputing, • Presence in supercomputing environments, • A clear commitment to open source, and • Today, one would also have to add ‘market share’.
  • 3. SandeepKumarPoonia ‘Grids’ are an approach for building dynamically constructed problem-solving environments using geographically and organizationally dispersed, high-performance computing and data handling resources. Grids also provide important infrastructure supporting multi- institutional collaboration. THE GRID CONTEXT
  • 4. SandeepKumarPoonia Functionally, Grids are tools, middleware, and services for • building the application frameworks that allow disciplined scientists to express and manage the simulation, analysis, and data management aspects of overall problem solving, • providing a uniform and secure access to a wide variety of distributed computing and data resources,
  • 5. SandeepKumarPoonia • supporting construction, management, and use of widely distributed application systems, • facilitating human collaboration through common security services, and resource and data sharing, • providing support for remote access to, and operation of, scientific and engineering instrumentation systems, and • managing and operating this computing and data infrastructure as a persistent service.
  • 6. SandeepKumarPoonia This is accomplished through two aspects: (1) a set of uniform software services that manage and provide access to heterogeneous, distributed resources and (2) a widely deployed infrastructure.
  • 8. SandeepKumarPoonia THE ANTICIPATED GRID USAGE MODEL WILL DETERMINE WHAT GETS DEPLOYED, AND WHEN •Grid computing models •Grid data models
  • 9. SandeepKumarPoonia There are a number of identifiable computing models in Grids that range from single resource to tightly coupled resources, and each requires some variations in Grid services. Grid computing models 1. Export existing services 2. Loosely coupled processes 3. Workflow managed processes 4. Distributed-pipelined/coupled processes 5. Tightly coupled processes
  • 10. SandeepKumarPoonia 1. Grids provide a uniform set of services to export the capabilities of existing computing facilities such as supercomputer centers to existing user communities, and this is accomplished by the Globus software. 2. The primary advantage of this form of Grids is to provide a uniform view of several related computing systems, or to prepare for other types of uses. 3. This sort of Grid also facilitates/encourages the incorporation of the supercomputers into user constructed systems (various sorts of portals or frameworks that run on user systems and provide for creating and managing related suites of Grid jobs). Export existing services
  • 11. SandeepKumarPoonia 1. By loosely coupled processes we mean collections of logically related jobs that nevertheless do not have much in common once they are executing. 2. That is, these jobs are given some input data that might, for example, be a small piece of a single large dataset, and they generate some output data that may have to be integrated with the output of other such jobs; however, their execution is largely independent of the other jobs in the collection. Two common types of such jobs are 1. data analysis, in which a large dataset is divided into units that can be analyzed independently, and 2. parameter studies, where a design space of many parameters is explored, usually at low model resolution, across many different parameter values Loosely coupled processes
  • 12. SandeepKumarPoonia Most workflow managers manage events of all sorts. By ‘event’, we mean essentially any asynchronous message that is used for decision-making purposes. Typical Grid events include 1. normal application occurrences that are used, for example, to trigger computational steering or semi-interactive graphical analysis, 2. abnormal application occurrences, such as numerical convergence failure, that are used to trigger corrective action, 3. messages that certain data files have been written and closed so that they may be used in some other processing step. Workflow managed processes
  • 13. SandeepKumarPoonia In application systems that involve multidisciplinary or other multicomponent simulations, it is very likely that the processes will need to be executed in a ‘pipeline’ fashion. That is, there will be a set of interdependent processes that communicate data back and forth throughout the entire execution of each process. In this case, co-scheduling is likely to be essential, as is good network bandwidth between the computing systems involved. Co-scheduling for the Grid involves scheduling multiple individual, potentially architecturally and administratively heterogeneous computing resources so that multiple processes are guaranteed to execute at the same time in order that they may communicate and coordinate with each other. Distributed-pipelined/coupled processes
  • 14. SandeepKumarPoonia MPI and Parallel Virtual Machine (PVM) support a distributed memory programming model. MPICH-G2 (the Globus-enabled MPI) provides for MPI style interprocess communication between Grid computing resources. It handles data conversion, communication establishment, and so on. Co-scheduling is essential for this to be a generally useful capability since different ‘parts’ of the same program are running on different systems. PVM is another distributed memory programming system that can be used in conjunction with Condor and Globus to provide Grid functionality for running tightly coupled processes. Tightly coupled processes
  • 15. SandeepKumarPoonia Many of the current production Grids are focused around communities whose interest in wide-area data management is at least as great as their interest in Grid-based computing. These include, for example, Particle Physics Data Grid (PPDG), Grid Physics Network (GriPhyN), and the European Union DataGrid. Like computing, there are several styles of data management in Grids, and these styles result in different requirements for the software of a Grid. Grid data models
  • 16. SandeepKumarPoonia •Data mining, can require access to metadata and uniform access to multiple data archives. •SRB/MCAT provides capabilities that include uniform remote access to data and local caching of the data for fast and/or multiple accesses. •Through its metadata catalogue, SRB provides the ability to federate multiple tertiary storage systems. •SRB provides a uniform interface by placing a server in front of (or as part of) the tertiary storage system. •This server must directly access the tertiary storage system, so there are several variations depending on the particular storage system. Occasional access to multiple tertiary storage systems
  • 17. SandeepKumarPoonia In many scientific disciplines, a large community of users requires remote access to large datasets. An effective technique for improving access speeds and reducing network loads can be to replicate frequently accessed datasets at locations chosen to be ‘near’ the eventual users. However, organizing such replication so that it is both reliable and efficient can be a challenging problem, for a variety of reasons. The datasets to be moved can be large, so issues of network performance and fault tolerance become important. Distributed analysis of massive datasets followed by cataloguing and archiving
  • 18. SandeepKumarPoonia The data-intensive science applications noted above that are international in their scope have motivated the GridFTP emphasis on providing WAN high performance and the ability to manage huge files in the wide area. To accomplish this, GridFTP provides • integrated GSI security and policy-based access control, • third-party transfers (between GridFTP servers), • wide-area network communication parameter optimization, • partial file access, • reliability/restart for large file transfers, • integrated performance monitoring instrumentation, • network parallel transfer streams, • server-side data striping and HPSS striped tapes), • server-side computation, • proxies (to address firewall and load-balancing).
  • 19. SandeepKumarPoonia A common situation is that a whole set of simulations or data analysis programs will require the use of the same large reference dataset. The management of such datasets, the originals of which almost always live in a tertiary storage system, could be handled by one of the replica managers. However, another service that is needed in this situation is a network cache: a unit of storage that can be accessed and allocated as a Grid resource, and that is located ‘close to’ (in the network sense) the Grid computational resources that will run the codes that use the data. The Distributed Parallel Storage System (DPSS) can provide this functionality; however, it is not currently well integrated with Globus. Large reference data sets
  • 20. SandeepKumarPoonia The Metadata Catalogue of SRB/MCAT provides a powerful mechanism for managing all types of descriptive information about data: data content information, fine-grained access control, physical storage device (which provides location independence for federating archives), and so on. The flip side of this is that the service is fairly heavyweight to use (when its full capabilities are desired) and it requires considerable operational support. Grid metadata management
  • 21. SandeepKumarPoonia Currently, Grids support collaboration, in the form of Virtual Organizations (VO) (by which we mean human collaborators, together with the Grid environment that they share), in two very important ways. GRID SUPPORT FOR COLLABORATION The GSI provides a common authentication approach that is a basic and essential aspect of collaboration. It provides the authentication and communication mechanisms, and trust management that allow groups of remote collaborators to interact with each other in a trusted fashion, and it is the basis of policy-based sharing of collaboration resources. GSI has the added advantage that it has been integrated with a number of tools that support collaboration, for example, secure remote login and remote shell – GSISSH, and secure ftp – GSIFTP, and GridFTP.
  • 22. SandeepKumarPoonia The second important contribution of Grids is that of supporting collaborations that are VO and as such have to provide ways to preserve and share the organizational and share community information (e.g. the location and description of key data repositories, code repositories, etc.). For this to be effective over the long term, there must be a persistent publication service where this information may be deposited and accessed by both humans and systems. The GIS can provide this service.
  • 23. SandeepKumarPoonia A third Grid collaboration service is the Access Grid (AG) – a group-to-group audio and videoconferencing facility that is based on Internet IP multicast, and it can be managed by an out-of-band floor control service. The AG is currently being integrated with the Globus directory and security services.
  • 24. SandeepKumarPoonia BUILDING AN INITIAL MULTISITE, COMPUTATIONAL AND DATA GRID 1. The Grid building team The successful Grid involve almost as much sociology as technology, and therefore establishing good working relationships among all the people involved is essential.
  • 25. SandeepKumarPoonia 2. Grid resources As early as possible in the process, identify the computing and storage resources to be incorporated into your Grid. In doing this be sensitive to the fact that opening up systems to Grid users may turn lightly or moderately loaded systems into heavily loaded systems. Batch schedulers may have to be installed on systems that previously did not use them in order to manage the increased load. BUILDING AN INITIAL MULTISITE, COMPUTATIONAL AND DATA GRID
  • 26. SandeepKumarPoonia BUILDING AN INITIAL MULTISITE, COMPUTATIONAL AND DATA GRID 3. Build the initial test bed Grid information service The Grid Information Service provides for locating resources based on the characteristics needed by a job (OS, CPU count, memory, etc.). The Globus MDS provides this capability with two components. The Grid Resource Information Service (GRIS) runs on the Grid resources (computing and data systems) and handles the soft-state registration of the resource characteristics. The Grid Information Index Server (GIIS) is a user accessible directory server that supports searching for resource by characteristics. Other information may also be stored in the GIIS, and the GGF, Grid Information Services group is defining schema for
  • 27. SandeepKumarPoonia BUILDING AN INITIAL MULTISITE, COMPUTATIONAL AND DATA GRID Build Globus on test systems Use PKI authentication and initially use certificates from the Globus Certificate Authority (‘CA’) or any other CA that will issue you certificates for this test environment. (The OpenSSL CA may be used for this testing.) Then validate access to, and operation of the, GIS/GIISs at all sites and test local and remote job submission using these certificates.
  • 28. SandeepKumarPoonia CROSS-SITE TRUST MANAGEMENT One of the most important contributions of Grids to supporting large-scale collaboration is the uniform Grid entity naming and authentication mechanisms provided by the GSI.
  • 29. SandeepKumarPoonia Trust Trust is ‘confidence in or reliance on some quality or attribute of a person or thing, or the truth of a statement’. Cyberspace trust starts with clear, transparent, negotiated, and documented policies associated with identity. When a Grid identity token (X.509 certificate in the current context) is presented for remote authentication and is verified using the appropriate cryptographic techniques, then the relying party should have some level of confidence that the person or entity that initiated the transaction is the person or entity that it is expected to be. CROSS-SITE TRUST MANAGEMENT
  • 30. SandeepKumarPoonia CROSS-SITE TRUST MANAGEMENT It is difficult to establish trust for large, heterogeneous VOs involving people from multiple, international institutions, because the shared trust models do not exist. The typical issues related to establishing trust may be summarized as follows: •Across administratively similar systems •for example, within an organization •informal/existing trust model can be extended toGrid authentication and authorization •Administratively diverse systems •for example, across many similar organizations. •formal/existing trust model can be extended to Grid authentication and authorization •Administratively heterogeneous •for example, cross multiple organizational types (e.g. science labs and industry), •for example, international collaborations •formal/new trust model for Grid authentication and authorization will need to be developed.
  • 31. SandeepKumarPoonia CROSS-SITE TRUST MANAGEMENT Establishing an operational CA3 Set up, or identify, a Certification Authority to issue Grid X.509 identity certificates to users and hosts. Both the IPG and DOE Science Grids use the Netscape CMS software for their operational CA because it is a mature product that allows a very scalable usage model that matches well with the needs of science VO.
  • 32. SandeepKumarPoonia CROSS-SITE TRUST MANAGEMENT Naming One of the important issues in developing a CP is the naming of the principals (‘subject,’ i.e. the Grid entity identified by the certificate). While there is an almost universal tendency to try and pack a lot of information into the subject name (which is a multicomponent, X.500 style name), increasingly there is an understanding that the less information of any kind put into a certificate, the better. This simplifies certificate management and re-issuance when users forget pass phrases (which will happen with some frequency).
  • 33. SandeepKumarPoonia The certification authority model There are several models for CAs; however, increasingly associated groups of collaborations/ VO are opting to find a single CA provider. The primary reason for this is that it is a formal and expensive process to operate a CA in such a way that it will be trusted by others
  • 35. SandeepKumarPoonia 1 First steps Issue host certificates for all the computing and data resources and establish procedures for installing them. Issue user certificates. Count on revoking and re-issuing all the certificates at least once before going operational. This is inevitable if you have not previously operated a CA. Using certificates issued by your CA, validate correct operation of the GSI , GSS libraries, GSISSH, and GSIFTP and/or GridFTP at all sites. Start training a Grid application support team on this prototype. TRANSITION TO A PROTOTYPE PRODUCTION GRID
  • 36. SandeepKumarPoonia The ‘boundaries’ of a Grid are primarily determined by three factors: • Interoperability of the Grid software: Many Grid sites run some variation of the Globus software, and there is fairly good interoperability between versions of Globus, so most Globus sites can potentially interoperate. • What CAs you trust : This is explicitly configured in each Globus environment on a per CA basis. • How you scope the searching of the GIS/GIISs or control the information that is published in them: This depends on the model that you choose for structuring your directory services. 2. Defining/understanding the extent of ‘your’ Grid
  • 37. SandeepKumarPoonia Directory servers above the local GIISs (resource information servers) are an important scaling mechanism for several reasons. There are currently two main approaches that are being used for building directory services above the local GIISs. One is a hierarchically structured set of directory servers and a managed namespace, al la X.500, and the other is ‘index’ servers that provide ad hoc, or ‘VO’ specific, views of a specific set of other servers, such as a collection of GIISs, data collections, and so on. 3. The model for the Grid Information System
  • 38. SandeepKumarPoonia An X.500 style hierarchical name component space directory structure Using an X.500 Style hierarchical name component space directory structure has the advantage of organizationally meaningful names that represent a set of ‘natural’ boundaries for scoping searches, and it also means that you can potentially use commercial metadirectory servers for better scaling.
  • 39. SandeepKumarPoonia Using the Globus MDS for the information directory hierarchy has several advantages. The MDS research and development work has added to the usual Lightweight Directory Access Protocol (LDAP)–based directory service capabilities several features that are important for Grids. Characteristics of MDS include the following: • Resources are typically named using the components of their Domain Name System (DNS) name. • One must use separate ‘index’ servers to define different relationships among GIISs, virtual organization, data collections, and so on. • Hierarchical GIISs (index nodes) are emerging as the preferred approach in the Grids community that uses the Globus software. Index server directory structure
  • 40. SandeepKumarPoonia As of yet, there is no standard authorization mechanism for Grids. Almost all current Grid software uses some form of access control lists (‘ACL’), which is straightforward, but typically does not scale very well. The Globus mapfile is an ACL that maps from Grid identities to local user identification numbers (UIDs) on the systems where jobs are to be run. The Globus Gatekeeper replaces the usual login authorization mechanism for Grid-based access and uses the mapfile to authorize access to resources after authentication. Therefore, managing the contents of the mapfile is the basic Globus user authorization mechanism for the local resource. 4. Local authorization
  • 41. SandeepKumarPoonia Incorporating any computing resource into a distributed application system via Grid services involves using a whole collection of IP communication ports that are otherwise not used. If your systems are behind a firewall, then these ports are almost certainly blocked, and you will have to negotiate with the site security folks to open the required ports. Globus can be configured to use a restricted range of ports, but it still needs several tens, or so, in the mid-700s. (The number depending on the level of usage of the resources behind the firewall.) 5. Site security issues
  • 42. SandeepKumarPoonia A Globus ‘port catalogue’ is available to tell what each Globus port is used for, and this lets you provide information that your site security folks will probably want to know. It will also let you estimate how many ports have to be opened (how many per process, per resource, etc.). Additionally, GIS/GIIS needs some ports open, and the CA typically uses a secure Web interface (port 443).
  • 43. SandeepKumarPoonia If you anticipate high data-rate distributed applications, whether for large- scale data movement or process-to-process communication, then enlist the help of a WAN networking specialist and check and refine the network bandwidth end-to-end using large packet size test data streams. Problems are likely between application host and site LAN/WAN gateways, WAN/WAN gateways, and along any path that traverses the commodity Internet. Considerable experience exists in the DOE Science Grid in detecting and correcting these types of problems, both in the areas of diagnostics and tuning. 6. High performance communications issues
  • 44. SandeepKumarPoonia There are several functions that are important to Grids that Grid middleware cannot emulate: these must be provided by the resources themselves. Some of the most important of these are the functions associated with job initiation and management on the remote computing resources. Development of the PBS batch scheduling system was an active part of the IPG project, and several important features were added in order to support Grids. 7. Batch schedulers
  • 45. SandeepKumarPoonia Try and find problems before your users do. Design test and validation suites that exercise your Grid in the same way that applications are likely to use your Grid. As early as possible in the construction of your Grid, identify some test case distributed applications that require reasonable bandwidth and run them across as many widely separated systems in your Grid as possible, and then run these test cases every time something changes in your configuration. 8. Preparing for users
  • 46. SandeepKumarPoonia At this point, Globus, the GIS/MDS, and the security infrastructure should all be operational on the test bed system(s). The Globus deployment team should be familiar with the install and operation issues and the system admins of the target resources should be engaged. Deploy and build Globus on at least two production computing platforms at two different sites. Establish the relationship between Globus job submission and the local batch schedulers (one queue, several queues, a Globus queue, etc.). Validate operation of this configuration. 9. Moving from test bed to prototype production Grid
  • 47. SandeepKumarPoonia Grids present special challenges for system administration owing to the administratively heterogeneous nature of the underlying resources. In the DOE Science Grid, we have built Grid monitoring tools from Grid services. We have developed pyGlobus modules for the NetSaint system monitoring framework that test GSIFTP, MDS and the Globus gatekeeper. We have plans for, but have not yet implemented, a GUI tool that will use these modules to allow an admin to quickly test functionality of a particular host. 10 Grid systems administration tools
  • 48. SandeepKumarPoonia Establish the model for moving data between all the systems involved in your Grid. GridFTP servers should be deployed on the Grid computing platforms and on the Grid data storage platforms. This presents special difficulties when data resides on user systems that are not usually Grid resources and raises the general issue of your Grid ‘service model’: what services are necessary to support in order to achieve a Grid that is useful for applications but are outside your core Grid resources (e.g. GridFTP on user data systems) and how you will support these services are issues that have to be recognized and addressed. 11 Data management and your Grid service model