SlideShare a Scribd company logo
"One can't believe impossible
things"
UK OGSA Evaluation Project
(UCL, Imperial, Newcastle, Edinburgh)
(Full list of project members)
Paul Brebner
University College London
P.Brebner@cs.ucl.ac.uk
"Grid middleware is easy to install, configure,
secure, debug and manage - across multiple sites"
Grid Complexity – The Grid will be BIG
Grid Complexity - growing
Grid Complexity – built on the internet
Grid Complexity – but more complex
Grid Simplicity – Start with something simple
• OGSA
– OGSI
• GT3.2 – exemplar of a Grid SOA
• Initially evaluate installation, configuration,
and security
• Then performance and scalability,
deployment, architectural choices, etc.
Grid Realism – But realistic test-bed
• Heterogeneous platforms
– Linux, Solaris, Windows
• Cross-organisational
– Four nodes
– Independently administered
– Firewalls and access restrictions
• Security
– UK e-Science CA
Grid Confusion – What is Globus?
• How is Globus intended to be used?
– 1: Science as first-order services: Middleware
for building and hosting Grid Applications, by
exposing science code as Grid services.
– 2: Middleware as services: As a set of high
level Grid services, composed to provide new
Grid functionality. Science isn’t first-order
service, but managed by Grid services.
Grid Confusion – Science services or Grid services
Client
E=mc2
1
Grid Confusion – Science services or Grid services
Client
E=mc2
1
D=A+2B+C2
Grid Confusion – Science services or Grid services
Client
2
D=A+2B+C2
E = mc2
E=mc2
1
D=A+2B+C2
Grid Confusion – How to evaluate
• Do we evaluate GT3 as middleware for
hosting Grid services, or as a toolkit for
constructing Grid middleware?
• If the first, only need GT3 Core – just the
container. If the second, need “All Services”
(and more – there’s no scheduler).
Grid Simplicity – Incremental
• Start with Core Package
• Add Security
• Then try “All Services”
• Simple enough – in theory
Grid Steps – single node
Install
OS/HW
GT3
Install
Grid Steps – single node
Install
Configure
OS/HW
GT3
Install
Grid Steps – single node
Install
Configure
Deploy
OS/HW
GT3
Install
Grid Steps – single node
Install
Configure
Deploy
Run
OS/HW
GT3
Install
Grid Steps – Multiple sites
GT3
Grid Steps – Multiple sites
GT3 GT3 GT3 GT3
Grid Steps – Multiple sites
GT3 GT3 GT3 GT3
Interoperate
Grid Steps – Multiple sites
GT3 GT3 GT3 GT3
Interoperate
GT3 GT3
Secure
Grid Steps – Multiple sites
GT3 GT3 GT3 GT3
Interoperate
GT3 GT3
Secure
Manage
Grid Reality – What we found
• Port number management
• Host access
• Remote visibility of installation, container,
services
• Installation by System Administrators
• Tomcat or Test container
• Compilation issues on Solaris
• Exponential increase in testing complexity as
number of nodes increases.
Grid Reality – What we found
• Port number management
– Post number conflicts (with other services)
– What port is the container running on?
Grid Reality – What we found
• Host access
– Is the container visible on that port externally?
– From which machines?
– For which users?
– Non-trivial to test/debug if/when something
goes wrong
Grid Reality – What we found
• Remote visibility of installation, container,
services
– What infrastructure is installed?
– What packages and versions?
– How is it configured?
– What state is it in?
Grid Reality – What we found
• Installation by System Administrators
– Division of roles
– Didn’t meet expectations
– Extra effort to support multiple roles
• System Administrators – install, configure and
secure
• Globus Administrators – test, maintain
• Globus Developers – develop, deploy, test/use Grid
services
Grid Reality – What we found
• Tomcat or Test container
– Differences in deployment, configuration, and
management
– With Tomcat, increased potential for centralised
management, and sand-boxing of run-time
environment
Grid Reality – What we found
• Compilation issues on Solaris
– Took longer than expected
– Only Linux testing and support can be taken for
granted
Grid Reality – What we found
• Exponential increase in testing complexity
as number of nodes increases
– Testing (and maintaining) interoperability
between m client machines, and n servers gets
complicated.
– How well will this scale for 100s, 1000s of
nodes?
Grid Reality – Security
• In theory just had to
– obtain (and update) host, client, and CA certificates
– convert
– install
– configure
– generate (and update) proxies.
• However, parts of “All Services” package also
needed.
Grid Security - What we found
• Interactions between security for multiple
installations
• Essential to test non-secure interoperability first
• Windows client-side security
• Testing and viewing security configuration
• Debugging secure calls
• Client side security is programmatic
• Security management scalability
– Construction and maintenance of user accounts and
grid-map file entries.
Grid Security - What we found
• Interactions between security for multiple
installations
– For testing may want
• multiple versions, or duplicates (with different
configurations) of same versions.
• One container with no security, and another
container with security
– May want test/production environments
Grid Security - What we found
• Essential to test non-secure interoperability
first
– Trying to test interoperability and security
simultaneously wasn’t fun
Grid Security - What we found
• Windows client-side security
– Still havn’t got it working
– Not obvious exactly what parts of Globus are
needed for client side code with security (no
“client plus security” package).
Grid Security - What we found
• Testing and viewing security configuration
– Need to be able to view/edit and check security
configuration for containers and services
– Confusion about hierarchical security settings
• Virtual Organisations, clusters, servers, containers,
factories, services, methods, and instances.
– Remotely
– Validate security deployment before run-time
Grid Security - What we found
• Debugging secure calls (or any stateful service)
– Proxy interceptor approach (e.g. TCPMON) won’t
work with stateful services
• As grid handle returned to client contains the port number of
the instance, not the proxy
– But proxies are an important design pattern for SOAs…
– GT4/WS-RF may be different
• Handle resolvers, WS-Addressing and WS-
RenewableReferences
Grid Security - What we found
• Client side security is programmatic
– Client side code modifications required to call
services/methods with required protocols
– Should be declarative
– Sensitive to server side security credentials
Grid Security - What we found
• Security management scalability
– Construction and maintenance of user accounts and grid-map file
entries.
– For each server, each user needs an account, and an entry in the
container gridmap file (mapping client certificate to account)
– May also need service specific gridmap files
– Not scalable for large numbers of users, servers, services.
• Alternatives?
– Tool support
– Role based authentication
– Shared accounts or certificates
Grid Recommendations
• If Globus is middleware, then need:
– Platform independent, automatic, installation.
– Tool support for configuration and deployment
creation, validation, viewing and editing.
– Management console for grid, nodes, globus
packages, containers and services.
– Support for remote, location independent,
cross-organisational, multiple role scenarios.
Grid Recommendations (continued)
• If Globus is middleware, then need:
– Remote deployment and management of
services.
– Remote distributed debugging of grid
installations, services, and applications.
– Tool support, and more scalable processes for
security.
Grid Alternatives
• Next we plan to evaluate the two architectural
choices in more detail
– Science exposed as services, vs science code managed
by higher level grid services.
• Explore alternative mechanisms for:
– Load balancing and resource management
– Directory services (service and resource discovery)
– Data movement approaches (e.g. SOAP Attachments vs
GridFTP)
Grid Performance
• First approach (initial results)
– Scientific benchmark (SciMark2.0) modified to
measure throughput, and invoked as a Stateful Grid
Service
– Metric is Calls Per Minute (CPM) – one unit of work.
– No data movement, just computation and memory load.
– JVM: 512MB Heap and –server (of course J)
• Good performance and scalability
– Security has minimal overhead
– Problem with client side timeouts as response times
increase
Grid Performance
ART (s)
0
50
100
150
200
0 10 20 30 40 50 60 70
Threads
Time(s)
UCL (4 cpu Sun)
Newcastle (2 cpu Intel)
Imperial (2 cpu Intel)
Edinburgh (4 hyperthread cpu Intel)
All
Tomcat
Fastest: 3.6s (Edinburgh)
Slowest: 25s (UCL)
Grid Performance
Throughput (CPM)
0
10
20
30
40
50
60
70
80
0 20 40 60 80
Threads
CPM
UCL (4 cpu Sun)
Newcastle (2 cpu Intel)
Imperial (2 cpu intel)
Edinburgh (4 hyperthread cpu Intel)
All (12 cpus)
Theoretical Maximum
95% of predicted maximum throughput
Grid Performance
• Tomcat vs Test container
– No difference on 3 out of 4 nodes
– But 67% faster on one node (Newcastle, slowest Intel
box)
• Attachments will work with GT3 and Tomcat
– But not with security
– Limit of 1GB (DIME)
– Bug in Axis – doesn’t clean up temporary files.
Grid Performance
• Stateful instances can be problematic
– Intermittent unreliability
• On some runs, 1 exception in 300 calls (reliability of .9967)
– But non-repeatable, SOAP/network related?
• What is the safe response to exceptions? Can’t just retry.
– Possible to kill container (relies on clients being well
behaved):
• By invoking same instance/method more than once.
• By consuming container resources
– But instances can be passivated/activated in theory
– Could be used to enable fine-grain (per instance) control over
resource usage.
Grid Deployment
• How to install and configure Grid infrastructure
and services - scalably and securely?
• Install GT3 infrastructure and security manually
– MMJFS allows executable code to be staged
automatically (But not services - could provide a
deployment service).
• Install bootstrapping code, and then install and
deploy all other code and security automatically.
– Using SmartFrog (HP) in the lab, and then test-bed.
– Configuring GT3 security remotely is an open-issue, as
is “trust” with System Administrators.
Grid Dreams - Debugging
• Debugging distributed systems is tricky
– Need better support for cross-cutting non-functional concerns such
as deployment and debugging.
– (One) problem with debugging services is not knowing the context
of errors (to aid diagnosis or cure) – a service is just an interface.
• Deployment aware debugging:
– Starting from functional work-flows, generate deployment-flows,
which are executed prior to, or concurrent with, functional work-
flows.
– If failure in functional work-flow, then corresponding deployment-
flow is examined to determine likely causes, and parts are re-
executed.
Grid Dreams - Debugging
• Backtrack through deployment steps (Like peeling
an onion)
– Some steps will need to be reversed
– Track dependencies, and redundant operations.
• This approach may fix an (interesting) sub-class of
problems:
• Those which can be fixed by simply redoing (or replicating) (part
of) the installation, E.g.
– Intermittent failure of container or services
– Resource starvation or overload
• Security problems that can be fixed with reconfiguration or
refresh of certificates/proxies.
– But not:
• network, or all configuration and security/access problems.
UK OGSA Evaluation Project
• Thank you J
– Questions/Comments?
• Email: P.Brebner@cs.ucl.ac.uk
– After November: Paul.Brebner@csiro.au
UK OGSA Evaluation Project
• Thank you J
– Questions/Comments?
• Email: P.Brebner@cs.ucl.ac.uk
– After November: Paul.Brebner@csiro.au
• Not
UK OGSA Evaluation Project
• Thank you J
– Questions/Comments?
• Email: P.Brebner@cs.ucl.ac.uk
– After November: Paul.Brebner@csiro.au
• Not (quite)
UK OGSA Evaluation Project
• Thank you J
– Questions/Comments?
• Email: P.Brebner@cs.ucl.ac.uk
– After November: Paul.Brebner@csiro.au
• Not (quite) the
UK OGSA Evaluation Project
• Thank you J
– Questions/Comments?
• Email: P.Brebner@cs.ucl.ac.uk
– After November: Paul.Brebner@csiro.au
• Not (quite) the End
UK OGSA Evaluation Project
• Thank you J
– Questions/Comments?
• Email: P.Brebner@cs.ucl.ac.uk
– After November: Paul.Brebner@csiro.au
• Not (quite) the End…
Postscript – The Secret Life of Grid?
UK OGSA Evaluation Project Report 1.0
Evaluation of Globus Toolkit 3.2 (GT3.2)
Installation
http://sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc
Postscript – The Secret Life of Grid?
Our experiences Evaluating Grid technology reminds me of an
Australian book (“The Secret Life of Wombats”) about a school boy
who used to sneak out of his dormitory after everyone was asleep to go
“wombatting”. He spent his nights secretly crawling down Wombat
burrows with a flashlight – a potentially lethal activity (not just from
cave-ins, as wombats are ferocious when cornered!) – and wrote
copious notes resulting in a substantial increase in knowledge of these
“mysterious and often misunderstood creatures”.
UK OGSA Evaluation Project Report 1.0
Evaluation of Globus Toolkit 3.2 (GT3.2)
Installation
http://sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc
Postscript – The Secret Life of Grid?
Our experiences Evaluating Grid technology reminds me of an
Australian book (“The Secret Life of Wombats”) about a school boy
who used to sneak out of his dormitory after everyone was asleep to go
“wombatting”. He spent his nights secretly crawling down Wombat
burrows with a flashlight – a potentially lethal activity (not just from
cave-ins, as wombats are ferocious when cornered!) – and wrote
copious notes resulting in a substantial increase in knowledge of these
“mysterious and often misunderstood creatures”.
UK OGSA Evaluation Project Report 1.0
Evaluation of Globus Toolkit 3.2 (GT3.2)
Installation
http://sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc

More Related Content

PDF
Grid Middleware – Principles, Practice and Potential
Paul Brebner
 
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
PPTX
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Paul Brebner
 
PPTX
OpenStack Paris 2014 - Federation, are we there yet ?
Tim Bell
 
PDF
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
 
PDF
Unlock cassandra data for application developers using graphQL
Cédrick Lunven
 
PDF
Do's and don'ts when deploying akka in production
jglobal
 
PDF
Upcoming services in OpenStack
Cisco DevNet
 
Grid Middleware – Principles, Practice and Potential
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Paul Brebner
 
OpenStack Paris 2014 - Federation, are we there yet ?
Tim Bell
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
 
Unlock cassandra data for application developers using graphQL
Cédrick Lunven
 
Do's and don'ts when deploying akka in production
jglobal
 
Upcoming services in OpenStack
Cisco DevNet
 

What's hot (20)

PDF
Tsinghua University: Two Exemplary Applications in China
DataStax Academy
 
PPTX
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
Cisco DevNet
 
PDF
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
VMware Tanzu
 
PDF
Data Stores @ Netflix
Vinay Kumar Chella
 
PPTX
Flexible compute
Peter Clapham
 
PDF
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Jonas Bonér
 
PPTX
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
PDF
Webinar: Diagnosing Apache Cassandra Problems in Production
DataStax Academy
 
PDF
The Last Pickle: Distributed Tracing from Application to Database
DataStax Academy
 
PPTX
RedisConf18 - Redis Enterprise on Cloud Native Platforms
Redis Labs
 
PDF
Sanger OpenStack presentation March 2017
Dave Holland
 
PDF
Keep your Hadoop cluster at its best!
Sheetal Dolas
 
PPTX
Event Detection Pipelines with Apache Kafka
DataWorks Summit
 
PPTX
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
PPTX
Experience with Kafka & Storm
Otto Mok
 
PDF
Introduction to Apache ZooKeeper
knowbigdata
 
PDF
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
PDF
Monitoring MySQL at scale
Ovais Tariq
 
PDF
How Spotify scales Apache Storm Pipelines
Kinshuk Mishra
 
PPTX
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
John Constable
 
Tsinghua University: Two Exemplary Applications in China
DataStax Academy
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
Cisco DevNet
 
Building A Diverse Geo-Architecture For Cloud Native Applications In One Day
VMware Tanzu
 
Data Stores @ Netflix
Vinay Kumar Chella
 
Flexible compute
Peter Clapham
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Jonas Bonér
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Webinar: Diagnosing Apache Cassandra Problems in Production
DataStax Academy
 
The Last Pickle: Distributed Tracing from Application to Database
DataStax Academy
 
RedisConf18 - Redis Enterprise on Cloud Native Platforms
Redis Labs
 
Sanger OpenStack presentation March 2017
Dave Holland
 
Keep your Hadoop cluster at its best!
Sheetal Dolas
 
Event Detection Pipelines with Apache Kafka
DataWorks Summit
 
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Experience with Kafka & Storm
Otto Mok
 
Introduction to Apache ZooKeeper
knowbigdata
 
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
Monitoring MySQL at scale
Ovais Tariq
 
How Spotify scales Apache Storm Pipelines
Kinshuk Mishra
 
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
John Constable
 
Ad

Similar to Grid middleware is easy to install, configure, secure, debug and manage across multiple sites ("One can't believe impossible things") (20)

PPTX
Grid computing
Neha Bhambu
 
PDF
Hungarian ClusterGrid and its applications
Ferenc Szalai
 
PDF
Globus Toolkit 3 Core – A Grid Service Container Framework: Thomas Sandholm J...
Information Security Awareness Group
 
PPT
Real Time, Web 2.0, and Grid Systems
Geoffrey Fox
 
PPTX
S16_Notes_CC.pptx
ganeshkarthy
 
PPT
Grid Technologies in Disaster Management
Videoguy
 
PPT
Ogsa
saranya devi
 
PDF
8. globus tool kit 3
Dr Sandeep Kumar Poonia
 
PDF
dc09ttp-2011-thesis
Theofilos Papapanagiotou
 
PPTX
Cs6703 grid and cloud computing unit 4
RMK ENGINEERING COLLEGE, CHENNAI
 
PDF
Grid.pdf
ANIKETKUMARSHARMA3
 
PPT
All about GridComputing-an introduction (2).ppt
lagoki2767
 
PPT
Clusters (Distributed computing)
Sri Prasanna
 
PPTX
Grid Computing
abhiritva
 
PPTX
Grid computing
Ramraj Choudhary
 
PPT
GridComputing-an introduction.ppt
NileshkuGiri
 
PPTX
General Introduction to technologies that will be seen in the school
ISSGC Summer School
 
PPT
Computing Outside The Box
Ian Foster
 
PDF
Globus Toolkit 4 Programming Java Services 1st Edition Borja Sotomayor
nyyrxoes9716
 
PDF
7. the grid ogsa
Dr Sandeep Kumar Poonia
 
Grid computing
Neha Bhambu
 
Hungarian ClusterGrid and its applications
Ferenc Szalai
 
Globus Toolkit 3 Core – A Grid Service Container Framework: Thomas Sandholm J...
Information Security Awareness Group
 
Real Time, Web 2.0, and Grid Systems
Geoffrey Fox
 
S16_Notes_CC.pptx
ganeshkarthy
 
Grid Technologies in Disaster Management
Videoguy
 
8. globus tool kit 3
Dr Sandeep Kumar Poonia
 
dc09ttp-2011-thesis
Theofilos Papapanagiotou
 
Cs6703 grid and cloud computing unit 4
RMK ENGINEERING COLLEGE, CHENNAI
 
All about GridComputing-an introduction (2).ppt
lagoki2767
 
Clusters (Distributed computing)
Sri Prasanna
 
Grid Computing
abhiritva
 
Grid computing
Ramraj Choudhary
 
GridComputing-an introduction.ppt
NileshkuGiri
 
General Introduction to technologies that will be seen in the school
ISSGC Summer School
 
Computing Outside The Box
Ian Foster
 
Globus Toolkit 4 Programming Java Services 1st Edition Borja Sotomayor
nyyrxoes9716
 
7. the grid ogsa
Dr Sandeep Kumar Poonia
 
Ad

More from Paul Brebner (20)

PPTX
Streaming More For Less With Apache Kafka Tiered Storage
Paul Brebner
 
PDF
30 Of My Favourite Open Source Technologies In 30 Minutes
Paul Brebner
 
PDF
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
PDF
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
PDF
Architecting Applications With Multiple Open Source Big Data Technologies
Paul Brebner
 
PDF
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Paul Brebner
 
PDF
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Paul Brebner
 
PDF
Spinning your Drones with Cadence Workflows and Apache Kafka
Paul Brebner
 
PDF
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Paul Brebner
 
PDF
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
PDF
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
PDF
A Visual Introduction to Apache Kafka
Paul Brebner
 
PDF
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Paul Brebner
 
PDF
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Paul Brebner
 
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
PPTX
0b101000 years of computing: a personal timeline - decade "0", the 1980's
Paul Brebner
 
PDF
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Paul Brebner
 
PPTX
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Paul Brebner
 
PDF
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
Paul Brebner
 
Streaming More For Less With Apache Kafka Tiered Storage
Paul Brebner
 
30 Of My Favourite Open Source Technologies In 30 Minutes
Paul Brebner
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Architecting Applications With Multiple Open Source Big Data Technologies
Paul Brebner
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Paul Brebner
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Paul Brebner
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Paul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Paul Brebner
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
A Visual Introduction to Apache Kafka
Paul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Paul Brebner
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
Paul Brebner
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Paul Brebner
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Paul Brebner
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
Paul Brebner
 

Recently uploaded (20)

PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Doc9.....................................
SofiaCollazos
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
The Future of Artificial Intelligence (AI)
Mukul
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Doc9.....................................
SofiaCollazos
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 

Grid middleware is easy to install, configure, secure, debug and manage across multiple sites ("One can't believe impossible things")

  • 1. "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members) Paul Brebner University College London [email protected] "Grid middleware is easy to install, configure, secure, debug and manage - across multiple sites"
  • 2. Grid Complexity – The Grid will be BIG
  • 4. Grid Complexity – built on the internet
  • 5. Grid Complexity – but more complex
  • 6. Grid Simplicity – Start with something simple • OGSA – OGSI • GT3.2 – exemplar of a Grid SOA • Initially evaluate installation, configuration, and security • Then performance and scalability, deployment, architectural choices, etc.
  • 7. Grid Realism – But realistic test-bed • Heterogeneous platforms – Linux, Solaris, Windows • Cross-organisational – Four nodes – Independently administered – Firewalls and access restrictions • Security – UK e-Science CA
  • 8. Grid Confusion – What is Globus? • How is Globus intended to be used? – 1: Science as first-order services: Middleware for building and hosting Grid Applications, by exposing science code as Grid services. – 2: Middleware as services: As a set of high level Grid services, composed to provide new Grid functionality. Science isn’t first-order service, but managed by Grid services.
  • 9. Grid Confusion – Science services or Grid services Client E=mc2 1
  • 10. Grid Confusion – Science services or Grid services Client E=mc2 1 D=A+2B+C2
  • 11. Grid Confusion – Science services or Grid services Client 2 D=A+2B+C2 E = mc2 E=mc2 1 D=A+2B+C2
  • 12. Grid Confusion – How to evaluate • Do we evaluate GT3 as middleware for hosting Grid services, or as a toolkit for constructing Grid middleware? • If the first, only need GT3 Core – just the container. If the second, need “All Services” (and more – there’s no scheduler).
  • 13. Grid Simplicity – Incremental • Start with Core Package • Add Security • Then try “All Services” • Simple enough – in theory
  • 14. Grid Steps – single node Install OS/HW GT3 Install
  • 15. Grid Steps – single node Install Configure OS/HW GT3 Install
  • 16. Grid Steps – single node Install Configure Deploy OS/HW GT3 Install
  • 17. Grid Steps – single node Install Configure Deploy Run OS/HW GT3 Install
  • 18. Grid Steps – Multiple sites GT3
  • 19. Grid Steps – Multiple sites GT3 GT3 GT3 GT3
  • 20. Grid Steps – Multiple sites GT3 GT3 GT3 GT3 Interoperate
  • 21. Grid Steps – Multiple sites GT3 GT3 GT3 GT3 Interoperate GT3 GT3 Secure
  • 22. Grid Steps – Multiple sites GT3 GT3 GT3 GT3 Interoperate GT3 GT3 Secure Manage
  • 23. Grid Reality – What we found • Port number management • Host access • Remote visibility of installation, container, services • Installation by System Administrators • Tomcat or Test container • Compilation issues on Solaris • Exponential increase in testing complexity as number of nodes increases.
  • 24. Grid Reality – What we found • Port number management – Post number conflicts (with other services) – What port is the container running on?
  • 25. Grid Reality – What we found • Host access – Is the container visible on that port externally? – From which machines? – For which users? – Non-trivial to test/debug if/when something goes wrong
  • 26. Grid Reality – What we found • Remote visibility of installation, container, services – What infrastructure is installed? – What packages and versions? – How is it configured? – What state is it in?
  • 27. Grid Reality – What we found • Installation by System Administrators – Division of roles – Didn’t meet expectations – Extra effort to support multiple roles • System Administrators – install, configure and secure • Globus Administrators – test, maintain • Globus Developers – develop, deploy, test/use Grid services
  • 28. Grid Reality – What we found • Tomcat or Test container – Differences in deployment, configuration, and management – With Tomcat, increased potential for centralised management, and sand-boxing of run-time environment
  • 29. Grid Reality – What we found • Compilation issues on Solaris – Took longer than expected – Only Linux testing and support can be taken for granted
  • 30. Grid Reality – What we found • Exponential increase in testing complexity as number of nodes increases – Testing (and maintaining) interoperability between m client machines, and n servers gets complicated. – How well will this scale for 100s, 1000s of nodes?
  • 31. Grid Reality – Security • In theory just had to – obtain (and update) host, client, and CA certificates – convert – install – configure – generate (and update) proxies. • However, parts of “All Services” package also needed.
  • 32. Grid Security - What we found • Interactions between security for multiple installations • Essential to test non-secure interoperability first • Windows client-side security • Testing and viewing security configuration • Debugging secure calls • Client side security is programmatic • Security management scalability – Construction and maintenance of user accounts and grid-map file entries.
  • 33. Grid Security - What we found • Interactions between security for multiple installations – For testing may want • multiple versions, or duplicates (with different configurations) of same versions. • One container with no security, and another container with security – May want test/production environments
  • 34. Grid Security - What we found • Essential to test non-secure interoperability first – Trying to test interoperability and security simultaneously wasn’t fun
  • 35. Grid Security - What we found • Windows client-side security – Still havn’t got it working – Not obvious exactly what parts of Globus are needed for client side code with security (no “client plus security” package).
  • 36. Grid Security - What we found • Testing and viewing security configuration – Need to be able to view/edit and check security configuration for containers and services – Confusion about hierarchical security settings • Virtual Organisations, clusters, servers, containers, factories, services, methods, and instances. – Remotely – Validate security deployment before run-time
  • 37. Grid Security - What we found • Debugging secure calls (or any stateful service) – Proxy interceptor approach (e.g. TCPMON) won’t work with stateful services • As grid handle returned to client contains the port number of the instance, not the proxy – But proxies are an important design pattern for SOAs… – GT4/WS-RF may be different • Handle resolvers, WS-Addressing and WS- RenewableReferences
  • 38. Grid Security - What we found • Client side security is programmatic – Client side code modifications required to call services/methods with required protocols – Should be declarative – Sensitive to server side security credentials
  • 39. Grid Security - What we found • Security management scalability – Construction and maintenance of user accounts and grid-map file entries. – For each server, each user needs an account, and an entry in the container gridmap file (mapping client certificate to account) – May also need service specific gridmap files – Not scalable for large numbers of users, servers, services. • Alternatives? – Tool support – Role based authentication – Shared accounts or certificates
  • 40. Grid Recommendations • If Globus is middleware, then need: – Platform independent, automatic, installation. – Tool support for configuration and deployment creation, validation, viewing and editing. – Management console for grid, nodes, globus packages, containers and services. – Support for remote, location independent, cross-organisational, multiple role scenarios.
  • 41. Grid Recommendations (continued) • If Globus is middleware, then need: – Remote deployment and management of services. – Remote distributed debugging of grid installations, services, and applications. – Tool support, and more scalable processes for security.
  • 42. Grid Alternatives • Next we plan to evaluate the two architectural choices in more detail – Science exposed as services, vs science code managed by higher level grid services. • Explore alternative mechanisms for: – Load balancing and resource management – Directory services (service and resource discovery) – Data movement approaches (e.g. SOAP Attachments vs GridFTP)
  • 43. Grid Performance • First approach (initial results) – Scientific benchmark (SciMark2.0) modified to measure throughput, and invoked as a Stateful Grid Service – Metric is Calls Per Minute (CPM) – one unit of work. – No data movement, just computation and memory load. – JVM: 512MB Heap and –server (of course J) • Good performance and scalability – Security has minimal overhead – Problem with client side timeouts as response times increase
  • 44. Grid Performance ART (s) 0 50 100 150 200 0 10 20 30 40 50 60 70 Threads Time(s) UCL (4 cpu Sun) Newcastle (2 cpu Intel) Imperial (2 cpu Intel) Edinburgh (4 hyperthread cpu Intel) All Tomcat Fastest: 3.6s (Edinburgh) Slowest: 25s (UCL)
  • 45. Grid Performance Throughput (CPM) 0 10 20 30 40 50 60 70 80 0 20 40 60 80 Threads CPM UCL (4 cpu Sun) Newcastle (2 cpu Intel) Imperial (2 cpu intel) Edinburgh (4 hyperthread cpu Intel) All (12 cpus) Theoretical Maximum 95% of predicted maximum throughput
  • 46. Grid Performance • Tomcat vs Test container – No difference on 3 out of 4 nodes – But 67% faster on one node (Newcastle, slowest Intel box) • Attachments will work with GT3 and Tomcat – But not with security – Limit of 1GB (DIME) – Bug in Axis – doesn’t clean up temporary files.
  • 47. Grid Performance • Stateful instances can be problematic – Intermittent unreliability • On some runs, 1 exception in 300 calls (reliability of .9967) – But non-repeatable, SOAP/network related? • What is the safe response to exceptions? Can’t just retry. – Possible to kill container (relies on clients being well behaved): • By invoking same instance/method more than once. • By consuming container resources – But instances can be passivated/activated in theory – Could be used to enable fine-grain (per instance) control over resource usage.
  • 48. Grid Deployment • How to install and configure Grid infrastructure and services - scalably and securely? • Install GT3 infrastructure and security manually – MMJFS allows executable code to be staged automatically (But not services - could provide a deployment service). • Install bootstrapping code, and then install and deploy all other code and security automatically. – Using SmartFrog (HP) in the lab, and then test-bed. – Configuring GT3 security remotely is an open-issue, as is “trust” with System Administrators.
  • 49. Grid Dreams - Debugging • Debugging distributed systems is tricky – Need better support for cross-cutting non-functional concerns such as deployment and debugging. – (One) problem with debugging services is not knowing the context of errors (to aid diagnosis or cure) – a service is just an interface. • Deployment aware debugging: – Starting from functional work-flows, generate deployment-flows, which are executed prior to, or concurrent with, functional work- flows. – If failure in functional work-flow, then corresponding deployment- flow is examined to determine likely causes, and parts are re- executed.
  • 50. Grid Dreams - Debugging • Backtrack through deployment steps (Like peeling an onion) – Some steps will need to be reversed – Track dependencies, and redundant operations. • This approach may fix an (interesting) sub-class of problems: • Those which can be fixed by simply redoing (or replicating) (part of) the installation, E.g. – Intermittent failure of container or services – Resource starvation or overload • Security problems that can be fixed with reconfiguration or refresh of certificates/proxies. – But not: • network, or all configuration and security/access problems.
  • 51. UK OGSA Evaluation Project • Thank you J – Questions/Comments? • Email: [email protected] – After November: [email protected]
  • 52. UK OGSA Evaluation Project • Thank you J – Questions/Comments? • Email: [email protected] – After November: [email protected] • Not
  • 53. UK OGSA Evaluation Project • Thank you J – Questions/Comments? • Email: [email protected] – After November: [email protected] • Not (quite)
  • 54. UK OGSA Evaluation Project • Thank you J – Questions/Comments? • Email: [email protected] – After November: [email protected] • Not (quite) the
  • 55. UK OGSA Evaluation Project • Thank you J – Questions/Comments? • Email: [email protected] – After November: [email protected] • Not (quite) the End
  • 56. UK OGSA Evaluation Project • Thank you J – Questions/Comments? • Email: [email protected] – After November: [email protected] • Not (quite) the End…
  • 57. Postscript – The Secret Life of Grid? UK OGSA Evaluation Project Report 1.0 Evaluation of Globus Toolkit 3.2 (GT3.2) Installation http://sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc
  • 58. Postscript – The Secret Life of Grid? Our experiences Evaluating Grid technology reminds me of an Australian book (“The Secret Life of Wombats”) about a school boy who used to sneak out of his dormitory after everyone was asleep to go “wombatting”. He spent his nights secretly crawling down Wombat burrows with a flashlight – a potentially lethal activity (not just from cave-ins, as wombats are ferocious when cornered!) – and wrote copious notes resulting in a substantial increase in knowledge of these “mysterious and often misunderstood creatures”. UK OGSA Evaluation Project Report 1.0 Evaluation of Globus Toolkit 3.2 (GT3.2) Installation http://sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc
  • 59. Postscript – The Secret Life of Grid? Our experiences Evaluating Grid technology reminds me of an Australian book (“The Secret Life of Wombats”) about a school boy who used to sneak out of his dormitory after everyone was asleep to go “wombatting”. He spent his nights secretly crawling down Wombat burrows with a flashlight – a potentially lethal activity (not just from cave-ins, as wombats are ferocious when cornered!) – and wrote copious notes resulting in a substantial increase in knowledge of these “mysterious and often misunderstood creatures”. UK OGSA Evaluation Project Report 1.0 Evaluation of Globus Toolkit 3.2 (GT3.2) Installation http://sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc