dans.knaw.nl
DANS is een instituut van KNAW en NWO
The world of Docker and Kubernetes
How to create, set up and manage
Kubernetes cluster at DANS: Dataverse pilot
Slava Tykhonov, Senior Information Scientist
Wilko Steinhoff, Senior Software Developer
(DANS-KNAW, The Hague, Netherlands)
11.02.2020
Why do we need Cloud Computing?
“Cloud computing is a style of computing in which scalable and
elastic IT is delivered as a service using Internet technologies.”
“Cloud Computing is transforming the way organisations
consume computer services.”
“We can run all our workload data of applications and
processes online over the internet remotely instead of using
physical hardware and software.”
“It’s less expensive and more secure.”
Dataverse is our Pilot Cloud Service
Dataverse as a FOSS product: good news
• Dataverse is Open Source software
• Great community with more than 100 contributors
• Contributions are coming from all continents
• Maintenance cost reduces as all community members are
using the same software and helping to each other
• Governance models can be reused by different countries
• Innovation in Dataverse community goes very fast
Dataverse as a FOSS product: bad news
• Open Source doesn’t mean Free!
• Consider all required resources: both hardware and human
• Building a service is difficult, maintenance is expensive
• Integration with other services requires the management of
changes and sometimes even not possible
• technical development is fast, the expertise isn’t up-to-date
• requires continuous training and very good communication
between all partners
Dataverse Installation Guide
Instructions
http://guides.dataverse.org/
en/latest/installation/
Before you start: installation
requires preparation!
Installation problems
Dataverse basic infrastructure seems to be very simple:
- application (Java deployed on Glassfish web server)
- database (postgres)
- search engine (SOLR)
If you’ll follow the guide and will do installation manually…
there is a great chance that it will not work.
Why?!
You never know where problem lies...
● OS specific issues
● application specific bugs
● the difference between the
database version(s)
● search engine update(s)
● security patches
● hardware issues
● open/closed ports on your server
It’s even more complicated if you need
to patch the software and update a
working infrastructure every time…
locally, on test/acceptance/production.
Typical infrastructure issues
And after it finally works the security
guy is telling you that all microservices
ports on all servers should be closed…
or there is an update of software
pieces that can break the service
or brand new chinese bot is putting
your service down
or something else is happening...
Do you remember? You have to reproduce and fix it
locally, on test/acceptance/production?
Software Testing Process
Maintenance vs development
Typical outcome: hundred/thousands of hours are lost, $$$,
maintenance efforts dominating over development.
Btw, the picture is clickable….
Quiet software development
That’s how not maintainable projects are typically dying… R.I.P.
FAIRness of Software
Open Source vs Closed Source
Dark side of the Moon
Source: V. Tykhonov, API economy: transformation from closed to open innovation
Open Source paradigm for Sharing economy
Dataverse Unleashed
Dataverse isn’t competing against Figshare, Zenodo,
DSpace, CKAN, EASY or others…
Dataverse is a platform to build new innovative things
together, and to integrate all the other services.
Using Dataverse means you can join the Sharing
Economy in data and speed up own innovation based
on the community developments.
Shared economy in the data landscape
● all partners are running the same basic data infrastructure
● source code is Open Source and shared
● community is making decisions about priorities
● new custom requirements can be implemented
independently by anyone and merged with master
(upstream)
● sustainability of software: not maintained components
usually replaced with well-maintained during the evolution
of the product
● two and more technical solutions of the same problem are
more than welcome
● the maturity of community mean the maturity of software
Do you want to join? Use Docker for your software!
Sometimes innovation means less communication
“Docker offered a way to create independence between the
application and the infrastructure through a standardized
container format that could be created with easy-to-use
tooling.”
David Messina, CMO at Docker
And now honestly ask yourself: how much time you’re spending to talk
and convince sysadmins to enable or install some tools you need?
To another developer working on the same code?
To reproduce the same bug on test/acceptance/production?
Docker features
• Extremely powerful configuration tool
• Allows to install software on any platform (Linux, Mac,
Windows)
• Any software can be installed from Docker as standalone
container or container delivering Microservices (database,
search engine, core service)
• Docker allows to host unlimited amount of the same
software tools on different ports
• Docker can be used to organise multilingual interfaces, for
example
Docker advantages
• Faster development and deployments
• Isolation of running containers allows to scale up apps
• Portability saves time to run the same image on the local
computer or in the cloud
• Snapshotting allows to archive Docker images state
• Resource limitation can be adjusted
Dataverse Docker module
This module was developed in one-year CESSDA DataverseEU
project and aimed for CESSDA Service Providers who have
limited technical resources. DANS led this project.
The goal was to deploy Dataverse software on CESSDA
Technical Infrastructure (Google Cloud). Project was funded
by the CESSDA 2018 workplan.
DataverseEU partners: ADP (Slovenia), AUSSDA (Austria),
GESIS (Germany), SND (Sweden), TARKI (Hungary),
SiencePro (France), UKDA (UK), UniData (Italy), SODA
(Belgium), LSZDA (Latvia), DANS (Netherlands)
Docker deployment with k8s in Clouds
• Google Cloud (policy for CESSDA SaW)
• Microsoft Azure
• Amazon Cloud
• OpenShift Cloud
• local Docker installation (minikube)
dans.knaw.nl
DANS is een instituut van KNAW en NWO
Example: Dataverse as set of Docker microservices
Docker Desktop (Community Edition)
Ideal for developers and small teams looking to get started
with Docker https://www.docker.com/community-edition
Features:
- docker-for-desktop
- docker-compose support
- integrated kubernetes (minikube)
- kitematic: Visual Docker Container Management
Docker Hub
Docker Hub is registry containing images
Example: https://hub.docker.com/_/httpd/
$ docker pull httpd
Push images to Docker Hub: https://docs.docker.com/docker-
cloud/builds/push-images/
$ docker login
$ docker tag my_image $DOCKER_ID_USER/my_image
$ docker push $DOCKER_ID_USER/my_image
Docker concepts
• Containers are runnable artefacts
• Images are copies of containers with filesystems
• Containers can be archived as images and executed in
different clouds
• Images can preserved in repositories
https://act.dataverse.nl/dataset.xhtml?persistentId=hdl:106
95/9VCRBR
• data folders can be hosted outside of containers on
persistent volumes.
Hello world app (Flask application)
Dockerfile https://github.com/DANS-KNAW/parthenos-
widget/blob/master/Dockerfile
FROM python:2.7
MAINTAINER Vyacheslav Tykhonov
COPY . /widget
WORKDIR /widget
RUN pip install -r requirements.txt
ENTRYPOINT ["python"]
CMD ["app.py"]
Docker command line usage
Command line allows to manage containers and images and
execute Docker commands
$ docker help run
$ docker ps
$ docker login
$ docker pull, push, commit
$ docker build, run
$ docker exec
$ docker stop, rm, rmi
Typical Docker pipeline
Install all dependencies and build tool from scratch:
$ docker build -t parthenos:latest .
Run image from command line
$ docker run -p 8081:8081 -name parthenos parthenos
Check if container is running
$ docker ps|grep parthenos
Login inside of the container
$ docker exec -it [CONTAINER_ID] /bin/bash
Copy configuration inside of the container
$ docker cp ./parthenos.config [CONTAINER_ID]:/widget
Copy from container to local folder
$ docker [CONTAINER_ID]:/widget/* ./
Ship “dockerized” app to the world (Docker Hub or another registry)
$ docker push [IMAGE_ID]
Pipeline explanation
Credits: Arun Gupta, Package your Java EE Application using Docker and Kubernetes
Docker archiving process
Easy process to archive running software, metadata and data
separately
https://docs.docker.com/engine/reference/commandline/save/
• postgresql database with metadata and users information
• datasets files in separate folder
• software image with some individual settings
$ docker save -o archive.tar [CONTAINER_ID]
Easy to restore complete system with data and metadata by
Docker composer.
$ docker load archive.tar
Docker Compose
Management tool for Docker configuration for multicontainer solutions
All connections, networks, containers, port specifications stored in one file
(YML specification)
Example (DataverseEU):
http://github.com/IQSS/dataverse-docker
Tool to turn Docker Compose to Kubernetes config called Kompose:
https://github.com/kubernetes/kompose
Usage:
$ docker-compose [something]
Docker Compose is perfect tool to keep the PROVenance of software
(versions control, etc)
Dataverse Docker containers exploration
# Show Docker images
docker images
# Show all running containers
docker ps
# Remove Docker image by container_id (don’t execute)
docker rmi container_id
# Delete old images (don’t execute)
docker rmi `docker images -aq`
# To access Dataverse container, type exit to quit
docker exec -it dataverse /bin/bash
# PostgreSQL container, exit to quit
docker exec -it postgres /bin/bash
# Solr container, exit to quit
docker exec -it solr /bin/bash
# Copy files and folders to the running container
docker cp ./testfile dataverse:/tmp/
# Copy files and folders from the running container to your disk space
docker cp dataverse:/opt/dv/dvinstall.zip /tmp/
# Stop Dataverse container
docker stop dataverse
# Run Dataverse container
docker start dataverse
Dataverse maintenance with Docker
# Open the page with latest Dataverse release https://github.com/IQSS/dataverse/releases
# Follow the upgrade instruction containing war and zip, optionally .tsv or .xml schema
docker exec -it dataverse /bin/bash
wget https://github.com/IQSS/dataverse/releases/download/v4.18.1/dataverse-4.18.1.war -
O dataverse.war
asadmin undeploy dataverse
rm -rf glassfish4/glassfish/domains/domain1/generated
asadmin deploy ./dataverse.war
asadmin restart
# After Glassfish will restart go to 0.0.0.0:8085 and check the version of Dataverse
# Remember: you’ll lose all changes in your Docker container after restart!
Maintenance of Docker infrastructure
# Go to hub.docker.com and create an account.
# Login with your credentials, remember your_docker_name
docker login
# Let’s create image out of the running Dataverse container
docker commit dataverse
# New image will be available on top
docker images
# Let’s put a tag on image and update internal Docker registry, replace your_docker_name
docker tag new_dataverse_image_id [your_docker_name]/dataverse:4.18.1
# Push new image to Docker Hub
docker push [your_docker_name]/dataverse:4.18.1
# Go to Docker Hub to check if the repo was updated:
https://hub.docker.com/r/[your_docker_name]/dataverse
# Visit the page https://docs.docker.com/docker-hub/repos/#pushing-a-docker-container-
image-to-docker-hub if your need more information about the update of Docker images
dans.knaw.nl
DANS is an institute of KNAW and NWO
How to set up, configure and manage Kubernetes clusters managed by
DANS. With emphasis on its architecture, ict-support and devops
POC Azure
management
Azure
Best practises in using and managing the DANS Azure-
subscription.
Azure: Cloud computing platform by Microsoft.
Azure@DANS is provided by SURFcumulus.
Cloud resources, like:
⮚-Virtual Machine (VM)
⮚-Storage (disk)
⮚-SQL database
⮚-Kubernetes (AKS)
Kubernetes
Open-source container-orchestration system for
automating application deployment, scaling, and
management.
-Docker container Orchestration.
-Infrastructure as Code
-Use of Health checks, restarting applications.
-(Auto)scaling cluster (horizontally and vertically).
-Controlled use of resources (CPU, Memory).
-Setup application stack for local development.
Best K8S practices
In this project we’ll look into some best K8S
practices for DANS.
Based on issues raised from earlier POC’s.
-Docker@DANS (2018)
-HUC2 POC (2019)
- Cluster Architecture
Application-wide or organisation-wide?
DTAP: Development, Testing, Acceptance and Production.
- How to separate different applications on a cluster.
- Can we separate responsibilities between ICT-Support and
developers?
Supply Persistent Storage classes by ICT-support that can be claimed by
developers.
Use of Role Based Access Control (RBAC).
- Tooling used to develop and deploy to a cluster?
Skaffold (build automation/deployment) and Helm (package manager)
- Use Infrastructure as Code (IaC) to provision and manage
"Azure" cloud infrastructure.
Bash scripts or Terraform.
- How to use "external" resources in a cluster.
SURF-object-storage (SWIFT), VANCIS
- Cluster costs management.
Downscaling a (development) cluster. Resource caps.
- Provide cluster-broad services.
Sending email, Auto-SSL certification, Monitoring (Prometheus),
Pipelining, etc.
Dataverse Cloud architecture
Ingress
HTTP(S) Load Balancer
Kubernetes Engine
Dataverse Service
Kubernetes Cluster
K8S Cluster Node
Dataverse Deployment Dataverse Service
Solr Deployment
Solr
Service
PostgreSQL
Service
PostgreSQL Deployment
Users
Kubernetes Engine
Compute Engine
Dataverse
Service
Kubernetes Cluster
Users
K8S Cluster Node2
Docker
Hub
Container Registry
K8S Cluster Node1
How to scale up Kubernetes horizontally
Kubernetes Engine
Compute Engine
Dataverse Service
Kubernetes Cluster
K8S Cluster Node1
Users
K8S Cluster Node2
Docker Hub
Container Registry
The importance of Persistent Storage
Docker containers write files to disk (I/O) for state or storage,
both in /data and /docroot folders. If a Docker container is
restarted for some reason, all data will be lost.
Solution: mount Persistent storage into the container on external
disk hosted in the Cloud.
Running Dataverse in production
HTTP(S) Load
Balancer Kubernetes Engine
Container Registry
Dataverse Service
Kubernetes Cluster
K8S Cluster Node
Dataverse Deployment
PostgreS
QL
Service
Solr Deployment
PostgreSQL Deployment
Users
Certbot Cronjob
Email Relay Deployment
Certbot
Service
Email
relay
Service
Dataverse Service
Solr
Service
Continuous deployment pipeline
1
2
3
git
push
Push GCP
container
registry
webh
ook
Create
docker
image
Kubernetes
Deployment
git clone
Jenkins pipeline
(Jenkinsfile)
75
Run tests
4 6
1. Developer pushes code to Bitbucket
2. Jenkins receives notification - build trigger
3. Jenkins clones the workspace
4. Runs tests
5. Creates docker image
6. Pushes the docker image to GCP
container registry
7. Updates the kubernetes deployment
Distributed Dataverse infra on Kubernetes
● Network of Dataverses with central portal to host metadata and
multiple Dataverse nodes
● Testing strategies with Selenium and Cypress
● Unit tests, integration tests and Jenkins CI/CD pipeline
● Running external applications on Kubernetes infrastructure,
OpenAIRE Amnesia tool
● Multiple languages support and maintenance, Weblate as a
service
● Using iRODS to support multiple storages for different datasets
Maintenance of distributed networks
● The maintenance of the distributed applications is very
difficult and expensive
● requires the highest level of service maturity
● increasing the code coverage does not necessarily lead to
more functionality coverage
● writing integration tests even more important than adding
more unit tests
● it’s almost not possible to run distributed services without
the help from community
Quality Assurance (QA) as a community service
Selenium IDE
allows to create
and replay all
UI tests in your
browser
Shared tests
can be reused
by Dataverse
CI/CD pipeline
Let’s work
together on it!
Example of Selenium .side file
● .side is the extension for
the new selenium ide
tests
● json format, every section
describes some action
● template rules can be
used by Selenium
webdriver
● can be easily integrated
in Continuous deployment
pipeline with Jenkins jobs
● running SIDE Runner with
the given parameters can
even test the different
components!
Questions?

The world of Docker and Kubernetes

  • 1.
    dans.knaw.nl DANS is eeninstituut van KNAW en NWO The world of Docker and Kubernetes How to create, set up and manage Kubernetes cluster at DANS: Dataverse pilot Slava Tykhonov, Senior Information Scientist Wilko Steinhoff, Senior Software Developer (DANS-KNAW, The Hague, Netherlands) 11.02.2020
  • 2.
    Why do weneed Cloud Computing? “Cloud computing is a style of computing in which scalable and elastic IT is delivered as a service using Internet technologies.” “Cloud Computing is transforming the way organisations consume computer services.” “We can run all our workload data of applications and processes online over the internet remotely instead of using physical hardware and software.” “It’s less expensive and more secure.” Dataverse is our Pilot Cloud Service
  • 3.
    Dataverse as aFOSS product: good news • Dataverse is Open Source software • Great community with more than 100 contributors • Contributions are coming from all continents • Maintenance cost reduces as all community members are using the same software and helping to each other • Governance models can be reused by different countries • Innovation in Dataverse community goes very fast
  • 4.
    Dataverse as aFOSS product: bad news • Open Source doesn’t mean Free! • Consider all required resources: both hardware and human • Building a service is difficult, maintenance is expensive • Integration with other services requires the management of changes and sometimes even not possible • technical development is fast, the expertise isn’t up-to-date • requires continuous training and very good communication between all partners
  • 5.
  • 6.
    Installation problems Dataverse basicinfrastructure seems to be very simple: - application (Java deployed on Glassfish web server) - database (postgres) - search engine (SOLR) If you’ll follow the guide and will do installation manually… there is a great chance that it will not work. Why?!
  • 7.
    You never knowwhere problem lies... ● OS specific issues ● application specific bugs ● the difference between the database version(s) ● search engine update(s) ● security patches ● hardware issues ● open/closed ports on your server It’s even more complicated if you need to patch the software and update a working infrastructure every time… locally, on test/acceptance/production.
  • 8.
    Typical infrastructure issues Andafter it finally works the security guy is telling you that all microservices ports on all servers should be closed… or there is an update of software pieces that can break the service or brand new chinese bot is putting your service down or something else is happening... Do you remember? You have to reproduce and fix it locally, on test/acceptance/production?
  • 9.
  • 10.
    Maintenance vs development Typicaloutcome: hundred/thousands of hours are lost, $$$, maintenance efforts dominating over development. Btw, the picture is clickable….
  • 11.
    Quiet software development That’show not maintainable projects are typically dying… R.I.P.
  • 12.
    FAIRness of Software OpenSource vs Closed Source
  • 13.
    Dark side ofthe Moon Source: V. Tykhonov, API economy: transformation from closed to open innovation
  • 14.
    Open Source paradigmfor Sharing economy
  • 15.
    Dataverse Unleashed Dataverse isn’tcompeting against Figshare, Zenodo, DSpace, CKAN, EASY or others… Dataverse is a platform to build new innovative things together, and to integrate all the other services. Using Dataverse means you can join the Sharing Economy in data and speed up own innovation based on the community developments.
  • 16.
    Shared economy inthe data landscape ● all partners are running the same basic data infrastructure ● source code is Open Source and shared ● community is making decisions about priorities ● new custom requirements can be implemented independently by anyone and merged with master (upstream) ● sustainability of software: not maintained components usually replaced with well-maintained during the evolution of the product ● two and more technical solutions of the same problem are more than welcome ● the maturity of community mean the maturity of software Do you want to join? Use Docker for your software!
  • 17.
    Sometimes innovation meansless communication “Docker offered a way to create independence between the application and the infrastructure through a standardized container format that could be created with easy-to-use tooling.” David Messina, CMO at Docker And now honestly ask yourself: how much time you’re spending to talk and convince sysadmins to enable or install some tools you need? To another developer working on the same code? To reproduce the same bug on test/acceptance/production?
  • 18.
    Docker features • Extremelypowerful configuration tool • Allows to install software on any platform (Linux, Mac, Windows) • Any software can be installed from Docker as standalone container or container delivering Microservices (database, search engine, core service) • Docker allows to host unlimited amount of the same software tools on different ports • Docker can be used to organise multilingual interfaces, for example
  • 19.
    Docker advantages • Fasterdevelopment and deployments • Isolation of running containers allows to scale up apps • Portability saves time to run the same image on the local computer or in the cloud • Snapshotting allows to archive Docker images state • Resource limitation can be adjusted
  • 20.
    Dataverse Docker module Thismodule was developed in one-year CESSDA DataverseEU project and aimed for CESSDA Service Providers who have limited technical resources. DANS led this project. The goal was to deploy Dataverse software on CESSDA Technical Infrastructure (Google Cloud). Project was funded by the CESSDA 2018 workplan. DataverseEU partners: ADP (Slovenia), AUSSDA (Austria), GESIS (Germany), SND (Sweden), TARKI (Hungary), SiencePro (France), UKDA (UK), UniData (Italy), SODA (Belgium), LSZDA (Latvia), DANS (Netherlands)
  • 21.
    Docker deployment withk8s in Clouds • Google Cloud (policy for CESSDA SaW) • Microsoft Azure • Amazon Cloud • OpenShift Cloud • local Docker installation (minikube)
  • 22.
    dans.knaw.nl DANS is eeninstituut van KNAW en NWO
  • 23.
    Example: Dataverse asset of Docker microservices
  • 24.
    Docker Desktop (CommunityEdition) Ideal for developers and small teams looking to get started with Docker https://www.docker.com/community-edition Features: - docker-for-desktop - docker-compose support - integrated kubernetes (minikube) - kitematic: Visual Docker Container Management
  • 25.
    Docker Hub Docker Hubis registry containing images Example: https://hub.docker.com/_/httpd/ $ docker pull httpd Push images to Docker Hub: https://docs.docker.com/docker- cloud/builds/push-images/ $ docker login $ docker tag my_image $DOCKER_ID_USER/my_image $ docker push $DOCKER_ID_USER/my_image
  • 26.
    Docker concepts • Containersare runnable artefacts • Images are copies of containers with filesystems • Containers can be archived as images and executed in different clouds • Images can preserved in repositories https://act.dataverse.nl/dataset.xhtml?persistentId=hdl:106 95/9VCRBR • data folders can be hosted outside of containers on persistent volumes.
  • 27.
    Hello world app(Flask application) Dockerfile https://github.com/DANS-KNAW/parthenos- widget/blob/master/Dockerfile FROM python:2.7 MAINTAINER Vyacheslav Tykhonov COPY . /widget WORKDIR /widget RUN pip install -r requirements.txt ENTRYPOINT ["python"] CMD ["app.py"]
  • 28.
    Docker command lineusage Command line allows to manage containers and images and execute Docker commands $ docker help run $ docker ps $ docker login $ docker pull, push, commit $ docker build, run $ docker exec $ docker stop, rm, rmi
  • 29.
    Typical Docker pipeline Installall dependencies and build tool from scratch: $ docker build -t parthenos:latest . Run image from command line $ docker run -p 8081:8081 -name parthenos parthenos Check if container is running $ docker ps|grep parthenos Login inside of the container $ docker exec -it [CONTAINER_ID] /bin/bash Copy configuration inside of the container $ docker cp ./parthenos.config [CONTAINER_ID]:/widget Copy from container to local folder $ docker [CONTAINER_ID]:/widget/* ./ Ship “dockerized” app to the world (Docker Hub or another registry) $ docker push [IMAGE_ID]
  • 30.
    Pipeline explanation Credits: ArunGupta, Package your Java EE Application using Docker and Kubernetes
  • 31.
    Docker archiving process Easyprocess to archive running software, metadata and data separately https://docs.docker.com/engine/reference/commandline/save/ • postgresql database with metadata and users information • datasets files in separate folder • software image with some individual settings $ docker save -o archive.tar [CONTAINER_ID] Easy to restore complete system with data and metadata by Docker composer. $ docker load archive.tar
  • 32.
    Docker Compose Management toolfor Docker configuration for multicontainer solutions All connections, networks, containers, port specifications stored in one file (YML specification) Example (DataverseEU): http://github.com/IQSS/dataverse-docker Tool to turn Docker Compose to Kubernetes config called Kompose: https://github.com/kubernetes/kompose Usage: $ docker-compose [something] Docker Compose is perfect tool to keep the PROVenance of software (versions control, etc)
  • 33.
    Dataverse Docker containersexploration # Show Docker images docker images # Show all running containers docker ps # Remove Docker image by container_id (don’t execute) docker rmi container_id # Delete old images (don’t execute) docker rmi `docker images -aq` # To access Dataverse container, type exit to quit docker exec -it dataverse /bin/bash # PostgreSQL container, exit to quit docker exec -it postgres /bin/bash # Solr container, exit to quit docker exec -it solr /bin/bash # Copy files and folders to the running container docker cp ./testfile dataverse:/tmp/ # Copy files and folders from the running container to your disk space docker cp dataverse:/opt/dv/dvinstall.zip /tmp/ # Stop Dataverse container docker stop dataverse # Run Dataverse container docker start dataverse
  • 34.
    Dataverse maintenance withDocker # Open the page with latest Dataverse release https://github.com/IQSS/dataverse/releases # Follow the upgrade instruction containing war and zip, optionally .tsv or .xml schema docker exec -it dataverse /bin/bash wget https://github.com/IQSS/dataverse/releases/download/v4.18.1/dataverse-4.18.1.war - O dataverse.war asadmin undeploy dataverse rm -rf glassfish4/glassfish/domains/domain1/generated asadmin deploy ./dataverse.war asadmin restart # After Glassfish will restart go to 0.0.0.0:8085 and check the version of Dataverse # Remember: you’ll lose all changes in your Docker container after restart!
  • 35.
    Maintenance of Dockerinfrastructure # Go to hub.docker.com and create an account. # Login with your credentials, remember your_docker_name docker login # Let’s create image out of the running Dataverse container docker commit dataverse # New image will be available on top docker images # Let’s put a tag on image and update internal Docker registry, replace your_docker_name docker tag new_dataverse_image_id [your_docker_name]/dataverse:4.18.1 # Push new image to Docker Hub docker push [your_docker_name]/dataverse:4.18.1 # Go to Docker Hub to check if the repo was updated: https://hub.docker.com/r/[your_docker_name]/dataverse # Visit the page https://docs.docker.com/docker-hub/repos/#pushing-a-docker-container- image-to-docker-hub if your need more information about the update of Docker images
  • 36.
    dans.knaw.nl DANS is aninstitute of KNAW and NWO How to set up, configure and manage Kubernetes clusters managed by DANS. With emphasis on its architecture, ict-support and devops POC Azure management
  • 37.
    Azure Best practises inusing and managing the DANS Azure- subscription. Azure: Cloud computing platform by Microsoft. Azure@DANS is provided by SURFcumulus. Cloud resources, like: ⮚-Virtual Machine (VM) ⮚-Storage (disk) ⮚-SQL database ⮚-Kubernetes (AKS)
  • 38.
    Kubernetes Open-source container-orchestration systemfor automating application deployment, scaling, and management. -Docker container Orchestration. -Infrastructure as Code -Use of Health checks, restarting applications. -(Auto)scaling cluster (horizontally and vertically). -Controlled use of resources (CPU, Memory). -Setup application stack for local development.
  • 39.
    Best K8S practices Inthis project we’ll look into some best K8S practices for DANS. Based on issues raised from earlier POC’s. -Docker@DANS (2018) -HUC2 POC (2019)
  • 40.
    - Cluster Architecture Application-wideor organisation-wide? DTAP: Development, Testing, Acceptance and Production. - How to separate different applications on a cluster. - Can we separate responsibilities between ICT-Support and developers? Supply Persistent Storage classes by ICT-support that can be claimed by developers. Use of Role Based Access Control (RBAC). - Tooling used to develop and deploy to a cluster? Skaffold (build automation/deployment) and Helm (package manager)
  • 41.
    - Use Infrastructureas Code (IaC) to provision and manage "Azure" cloud infrastructure. Bash scripts or Terraform. - How to use "external" resources in a cluster. SURF-object-storage (SWIFT), VANCIS - Cluster costs management. Downscaling a (development) cluster. Resource caps. - Provide cluster-broad services. Sending email, Auto-SSL certification, Monitoring (Prometheus), Pipelining, etc.
  • 42.
    Dataverse Cloud architecture Ingress HTTP(S)Load Balancer Kubernetes Engine Dataverse Service Kubernetes Cluster K8S Cluster Node Dataverse Deployment Dataverse Service Solr Deployment Solr Service PostgreSQL Service PostgreSQL Deployment Users
  • 43.
    Kubernetes Engine Compute Engine Dataverse Service KubernetesCluster Users K8S Cluster Node2 Docker Hub Container Registry K8S Cluster Node1
  • 44.
    How to scaleup Kubernetes horizontally Kubernetes Engine Compute Engine Dataverse Service Kubernetes Cluster K8S Cluster Node1 Users K8S Cluster Node2 Docker Hub Container Registry
  • 45.
    The importance ofPersistent Storage Docker containers write files to disk (I/O) for state or storage, both in /data and /docroot folders. If a Docker container is restarted for some reason, all data will be lost. Solution: mount Persistent storage into the container on external disk hosted in the Cloud.
  • 46.
    Running Dataverse inproduction HTTP(S) Load Balancer Kubernetes Engine Container Registry Dataverse Service Kubernetes Cluster K8S Cluster Node Dataverse Deployment PostgreS QL Service Solr Deployment PostgreSQL Deployment Users Certbot Cronjob Email Relay Deployment Certbot Service Email relay Service Dataverse Service Solr Service
  • 47.
    Continuous deployment pipeline 1 2 3 git push PushGCP container registry webh ook Create docker image Kubernetes Deployment git clone Jenkins pipeline (Jenkinsfile) 75 Run tests 4 6 1. Developer pushes code to Bitbucket 2. Jenkins receives notification - build trigger 3. Jenkins clones the workspace 4. Runs tests 5. Creates docker image 6. Pushes the docker image to GCP container registry 7. Updates the kubernetes deployment
  • 48.
    Distributed Dataverse infraon Kubernetes ● Network of Dataverses with central portal to host metadata and multiple Dataverse nodes ● Testing strategies with Selenium and Cypress ● Unit tests, integration tests and Jenkins CI/CD pipeline ● Running external applications on Kubernetes infrastructure, OpenAIRE Amnesia tool ● Multiple languages support and maintenance, Weblate as a service ● Using iRODS to support multiple storages for different datasets
  • 49.
    Maintenance of distributednetworks ● The maintenance of the distributed applications is very difficult and expensive ● requires the highest level of service maturity ● increasing the code coverage does not necessarily lead to more functionality coverage ● writing integration tests even more important than adding more unit tests ● it’s almost not possible to run distributed services without the help from community
  • 50.
    Quality Assurance (QA)as a community service Selenium IDE allows to create and replay all UI tests in your browser Shared tests can be reused by Dataverse CI/CD pipeline Let’s work together on it!
  • 51.
    Example of Selenium.side file ● .side is the extension for the new selenium ide tests ● json format, every section describes some action ● template rules can be used by Selenium webdriver ● can be easily integrated in Continuous deployment pipeline with Jenkins jobs ● running SIDE Runner with the given parameters can even test the different components!
  • 52.