SlideShare a Scribd company logo
The ANDC Cluster Story. Animesh Kumar & Ankit Bhattacharjee
Chapter 1. In the beginning there was……
Then it exploded….. The Idea: The cluster project started through a discussion between the Principal of ANDC,  Dr Savithri Singh , and the Director of    OpenLX,  Mr Sudhir Gandotra  during a Linux workshop in 2007  Dr Sanjay Chauhan’s recruitment : Dr. Savithri Singh  inducted Dr Sanjay Chauhan from Physics department in the cluster project   Clueless students' involvement : Arjun, Animesh, Ankit and Sudhang.
Chapter 2
Initially the project was very challenging, the challenges being of two sorts:  Technical : Especially the reclamation of the to-be-junked  hardware, and. Human :  Mostly relating to the lack of experience and know- how of the players. This was especially hurtful,  since it cost significant man-hours spent on suboptimal and downright incorrect 'solutions' that could have been avoided had the team been slightly more knowledgeable.
Chapter 3 Not everything that can be counted counts, and not everything that counts can be counted.
Junkyard Reclamation…. The project officially started when the team was "presented"  with 18- 20 decrepit machines of which barely 5 worked .  The junk consisted of A gallery of PI's, PII's, PIII's at the end of their life, most of them not working, requiring us to  implement some: Upgradation : Some of those that did required significant upgrades to be worth deployment in the cluster. Scavenging : Over a certain length of time, few could be repaired while the rest were discarded after " scavenging " useful parts from them for use in the future and in salvageable machines. Arjun’s  knowledge on hardware acts as great foundation and learning experience.
Experiences don't come cheap….. The first investment : Since a fairly "impressive" cluster needed to be at least visibly fast to the lay observer, the machines had to be upgraded in RAM.  25 X 256 SDRAM modules were purchased  and multiples of these were put in all working machines .
Finally 6 comps that were in the best state were chosen as follows: Specs here: 4 X PII with 512 MB RAM. 2 X PIII with 512 MB RAM. These were connected via a 100Mbps switch.
Chapter 4 Wisdom Through Failure….
Our first mistake….. ClusterKoppix  is chosen  Based on thorough research by Dr. Chauhan on the topic, we choose: ClusterKnoppix  is a specialized Linux distribution based on the Knoppix distribution, but which uses the openMosix kernel.  openMosix , developed by Israeli technologist, author, investor and entrepreneur   Moshe Bar  was a fork of the once-open, then-proprietary MOSIX cluster system.
Why cluster knoppix? Lack of requisite knowledge to remaster or implement changes at kernel level.  ClusterKnoppix  aims to provide the same core features and software as  Knoppix , but adds the o penMosix  clustering capabilities also. Specifically designed to be a  good master node . openMosix  has the  ability to build a cluster out of inexpensive hardware  giving you a traditional supercomputer. As long as you use processors out of the same architecture, any configuration of your node is possible.
No cdrom drive/harddisk/floppy needed for the clients / openMosix autodiscovery: New nodes automatically join the cluster (no configuration needed).  Cluster Management tools  : openMosix userland / openMosixview   Every node can run full blown X  (PC-room/demo setup) or,  Console only  :more memory available for user applications.
What Could Have Been……
Problems up there… Both clusterknoppix and openMosix development had  stopped so not much support was available.    
OpenMosix terminal server - uses PXE, DHCP and tftp to boot linux clients via the network: So it was’nt compatible with the older cards in our fixed machines which were’nt PXE enabled. Would’nt work on WFC machines' lan cards:   No support for post 2.4.x kernels,hence it could’nt be deployed on any of the other  labs in the college, as the machines on those had network cards that were incompatible with the GNU/Linux kernel versions with which openMosix worked.   
Problems down under…… On the master node we executed the following commands: 1) ifconfig eth0 192.168.1.10 2) route add -net 0.0.0.0 gw 192.168.1.1 3) tyd -f init 4) tyd And on the drone node we executed: 1) ifconfig eth0 192.168.1.20 2) route add -net 0.0.0.0 gw 192.168.1.1 3) tyd -f init 4) tyd -m 192.168.1.10 The error we got was : SIOCSIFFLAGS : no such device
Chapter 5   Any port in a storm…
Other solutions tried…. The 'educational' BCCD from the university of IOWA : The BCCD was created to facilitate  instruction  of parallel computing aspects and paradigms. The BCCD is a bootable CD image that boots up into a pre-configured distributed computing environment. Focus in on  educational  aspects of High-Performance Computing (HPC) instead of the HPC core  . Problem … It asked for a password even from a live cd due to the hardware incompatibility!!!!!!!!
CHAOS: Small (6Mbyte) Linux distribution designed for creating ad hoc computer clusters. This tiny disc will boot any i586 class PC (that supports CD booting), into a working openMosix node, without disturbing (or even touching) the contents of any local hard disk.  Quantian OS:   A re-mastering of clusterknoppix for computational sciences. The environment is self-configuring and directly bootable.
Chapter 6. First taste of success….
Paralledigm Shift!!! After a lot of frustrating trials that the clusterKnoppix idea was dropped. Parallel Knoppix(Upgraded to Pelican HPC) is chosen  : ParallelKnoppix is a live CD image that let's you set up a high performance computing cluster in a few minutes. A Parallel cluster allows you to do parallel computing using MPI. Advantages:   The frontend node (either a real computer or a virtual machine) boots from the CD image. The compute nodes boot by PXE, using the frontend node as the server. The LAM-MPI and OpenMPI implementations of MPI are installed. Contains extensive example programs . Very easy to add packages
Didn't work immediately : PK needs LAN-booting support and our network cards didn't support it. We added “no acpi” and accidentally it worked.. ;) Etherboot is used : gPXE/Etherboot is an open source(GPL) network bootloader. It provides a direct replacement for proprietary PXE ROMs, with many extra features such as DNS, HTTP, iSCSI, etc . This solution, thus, gave us our first cluster.
What the future holds…… More permanent solution instead of temporary solution. eg ROCKS, HADOOP, DISCO..... Implementing key parallel algorithms. Developing a guide for future cluster administrators.. (Who should be students.... :) ) Familiarizing other departments with the applications of cluster for their research.
 

More Related Content

What's hot (20)

PDF
Performance Evaluation using TAU Performance System and E4S
Ganesan Narayanasamy
 
PDF
Why everyone is excited about Docker (and you should too...) - Carlo Bonamic...
Codemotion
 
PDF
Docker and Puppet — Puppet Camp L.A. — SCALE12X
Jérôme Petazzoni
 
PDF
Introduction to GPUs in HPC
inside-BigData.com
 
PDF
P2P Container Image Distribution on IPFS With containerd and nerdctl
Kohei Tokunaga
 
PPT
Webinar: Whats New in Java 8 with Develop Intelligence
AMD Developer Central
 
PDF
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
AMD Developer Central
 
PDF
olibc: Another C Library optimized for Embedded Linux
National Cheng Kung University
 
PDF
Handout2o
Shahbaz Sidhu
 
PPTX
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...
The Linux Foundation
 
PDF
Making Linux do Hard Real-time
National Cheng Kung University
 
PDF
Utilizing AMD GPUs: Tuning, programming models, and roadmap
George Markomanolis
 
PDF
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
AMD Developer Central
 
PDF
AWS re:Invent - Accelerating Research
Chris Dagdigian
 
PDF
Priority Inversion on Mars
National Cheng Kung University
 
PDF
Shorten Device Boot Time for Automotive IVI and Navigation Systems
National Cheng Kung University
 
PDF
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
AMD Developer Central
 
PPTX
OSSEU17: How Open Source Project Xen Puts Security Software Vendors Ahead of ...
The Linux Foundation
 
PDF
Easier, Better, Faster, Safer Deployment with Docker and Immutable Containers
C4Media
 
PDF
GPU Ecosystem
Ofer Rosenberg
 
Performance Evaluation using TAU Performance System and E4S
Ganesan Narayanasamy
 
Why everyone is excited about Docker (and you should too...) - Carlo Bonamic...
Codemotion
 
Docker and Puppet — Puppet Camp L.A. — SCALE12X
Jérôme Petazzoni
 
Introduction to GPUs in HPC
inside-BigData.com
 
P2P Container Image Distribution on IPFS With containerd and nerdctl
Kohei Tokunaga
 
Webinar: Whats New in Java 8 with Develop Intelligence
AMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
AMD Developer Central
 
olibc: Another C Library optimized for Embedded Linux
National Cheng Kung University
 
Handout2o
Shahbaz Sidhu
 
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...
The Linux Foundation
 
Making Linux do Hard Real-time
National Cheng Kung University
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
George Markomanolis
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
AMD Developer Central
 
AWS re:Invent - Accelerating Research
Chris Dagdigian
 
Priority Inversion on Mars
National Cheng Kung University
 
Shorten Device Boot Time for Automotive IVI and Navigation Systems
National Cheng Kung University
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
AMD Developer Central
 
OSSEU17: How Open Source Project Xen Puts Security Software Vendors Ahead of ...
The Linux Foundation
 
Easier, Better, Faster, Safer Deployment with Docker and Immutable Containers
C4Media
 
GPU Ecosystem
Ofer Rosenberg
 

Similar to The Andc Cluster (20)

PDF
Bsd ss
Dru Lavigne
 
PDF
Building SuperComputers @ Home
Abhishek Parolkar
 
PPT
Again music
variable_orr
 
PPS
Sioux Hot-or-Not: The future of Linux (Alan Cox)
siouxhotornot
 
PDF
Linuxcon Barcelon 2012: LXC Best Practices
christophm
 
ODP
Proxmox Talk - Linux Fest Northwest 2018
Richard Clark
 
DOC
Ibm system programming training module 2 - linux basics
Binsent Ribera
 
PPT
Lecture 4 Cluster Computing
Dr. Shaikh A.Khalique
 
ODP
Blades for HPTC
Guy Coates
 
PDF
BruCON 2010 Lightning Talks - DIY Grid Computing
tomaszmiklas
 
PPT
Linux [2005]
Raul Soto
 
PDF
Kernel linux lab manual feb (1)
johny shaik
 
PDF
LinuxPresentation500kb
Matt R
 
PPT
OpenMosix.ppt
DrAdeelAkram2
 
PPTX
Linux Training
mindmajixtrainings
 
PDF
SouthEast LinuxFest 2015 - Managing linux in a engineering college
edgester
 
PDF
Bringing the Unix Philosophy to Big Data
bcantrill
 
PPT
Linux fundamentals Training
Love Steven
 
PDF
Howto Pxeboot
Rogério Sampaio
 
PPTX
Opening last bits of the infrastructure
Erwan Velu
 
Bsd ss
Dru Lavigne
 
Building SuperComputers @ Home
Abhishek Parolkar
 
Again music
variable_orr
 
Sioux Hot-or-Not: The future of Linux (Alan Cox)
siouxhotornot
 
Linuxcon Barcelon 2012: LXC Best Practices
christophm
 
Proxmox Talk - Linux Fest Northwest 2018
Richard Clark
 
Ibm system programming training module 2 - linux basics
Binsent Ribera
 
Lecture 4 Cluster Computing
Dr. Shaikh A.Khalique
 
Blades for HPTC
Guy Coates
 
BruCON 2010 Lightning Talks - DIY Grid Computing
tomaszmiklas
 
Linux [2005]
Raul Soto
 
Kernel linux lab manual feb (1)
johny shaik
 
LinuxPresentation500kb
Matt R
 
OpenMosix.ppt
DrAdeelAkram2
 
Linux Training
mindmajixtrainings
 
SouthEast LinuxFest 2015 - Managing linux in a engineering college
edgester
 
Bringing the Unix Philosophy to Big Data
bcantrill
 
Linux fundamentals Training
Love Steven
 
Howto Pxeboot
Rogério Sampaio
 
Opening last bits of the infrastructure
Erwan Velu
 
Ad

Recently uploaded (20)

PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Python basic programing language for automation
DanialHabibi2
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Ad

The Andc Cluster

  • 1. The ANDC Cluster Story. Animesh Kumar & Ankit Bhattacharjee
  • 2. Chapter 1. In the beginning there was……
  • 3. Then it exploded….. The Idea: The cluster project started through a discussion between the Principal of ANDC, Dr Savithri Singh , and the Director of OpenLX, Mr Sudhir Gandotra during a Linux workshop in 2007 Dr Sanjay Chauhan’s recruitment : Dr. Savithri Singh inducted Dr Sanjay Chauhan from Physics department in the cluster project Clueless students' involvement : Arjun, Animesh, Ankit and Sudhang.
  • 5. Initially the project was very challenging, the challenges being of two sorts: Technical : Especially the reclamation of the to-be-junked hardware, and. Human : Mostly relating to the lack of experience and know- how of the players. This was especially hurtful, since it cost significant man-hours spent on suboptimal and downright incorrect 'solutions' that could have been avoided had the team been slightly more knowledgeable.
  • 6. Chapter 3 Not everything that can be counted counts, and not everything that counts can be counted.
  • 7. Junkyard Reclamation…. The project officially started when the team was "presented" with 18- 20 decrepit machines of which barely 5 worked . The junk consisted of A gallery of PI's, PII's, PIII's at the end of their life, most of them not working, requiring us to implement some: Upgradation : Some of those that did required significant upgrades to be worth deployment in the cluster. Scavenging : Over a certain length of time, few could be repaired while the rest were discarded after " scavenging " useful parts from them for use in the future and in salvageable machines. Arjun’s knowledge on hardware acts as great foundation and learning experience.
  • 8. Experiences don't come cheap….. The first investment : Since a fairly "impressive" cluster needed to be at least visibly fast to the lay observer, the machines had to be upgraded in RAM. 25 X 256 SDRAM modules were purchased and multiples of these were put in all working machines .
  • 9. Finally 6 comps that were in the best state were chosen as follows: Specs here: 4 X PII with 512 MB RAM. 2 X PIII with 512 MB RAM. These were connected via a 100Mbps switch.
  • 10. Chapter 4 Wisdom Through Failure….
  • 11. Our first mistake….. ClusterKoppix is chosen Based on thorough research by Dr. Chauhan on the topic, we choose: ClusterKnoppix  is a specialized Linux distribution based on the Knoppix distribution, but which uses the openMosix kernel. openMosix , developed by Israeli technologist, author, investor and entrepreneur Moshe Bar was a fork of the once-open, then-proprietary MOSIX cluster system.
  • 12. Why cluster knoppix? Lack of requisite knowledge to remaster or implement changes at kernel level. ClusterKnoppix aims to provide the same core features and software as Knoppix , but adds the o penMosix clustering capabilities also. Specifically designed to be a good master node . openMosix has the ability to build a cluster out of inexpensive hardware giving you a traditional supercomputer. As long as you use processors out of the same architecture, any configuration of your node is possible.
  • 13. No cdrom drive/harddisk/floppy needed for the clients / openMosix autodiscovery: New nodes automatically join the cluster (no configuration needed).  Cluster Management tools : openMosix userland / openMosixview Every node can run full blown X (PC-room/demo setup) or, Console only :more memory available for user applications.
  • 14. What Could Have Been……
  • 15. Problems up there… Both clusterknoppix and openMosix development had stopped so not much support was available.   
  • 16. OpenMosix terminal server - uses PXE, DHCP and tftp to boot linux clients via the network: So it was’nt compatible with the older cards in our fixed machines which were’nt PXE enabled. Would’nt work on WFC machines' lan cards: No support for post 2.4.x kernels,hence it could’nt be deployed on any of the other labs in the college, as the machines on those had network cards that were incompatible with the GNU/Linux kernel versions with which openMosix worked.  
  • 17. Problems down under…… On the master node we executed the following commands: 1) ifconfig eth0 192.168.1.10 2) route add -net 0.0.0.0 gw 192.168.1.1 3) tyd -f init 4) tyd And on the drone node we executed: 1) ifconfig eth0 192.168.1.20 2) route add -net 0.0.0.0 gw 192.168.1.1 3) tyd -f init 4) tyd -m 192.168.1.10 The error we got was : SIOCSIFFLAGS : no such device
  • 18. Chapter 5 Any port in a storm…
  • 19. Other solutions tried…. The 'educational' BCCD from the university of IOWA : The BCCD was created to facilitate  instruction  of parallel computing aspects and paradigms. The BCCD is a bootable CD image that boots up into a pre-configured distributed computing environment. Focus in on  educational  aspects of High-Performance Computing (HPC) instead of the HPC core . Problem … It asked for a password even from a live cd due to the hardware incompatibility!!!!!!!!
  • 20. CHAOS: Small (6Mbyte) Linux distribution designed for creating ad hoc computer clusters. This tiny disc will boot any i586 class PC (that supports CD booting), into a working openMosix node, without disturbing (or even touching) the contents of any local hard disk. Quantian OS:   A re-mastering of clusterknoppix for computational sciences. The environment is self-configuring and directly bootable.
  • 21. Chapter 6. First taste of success….
  • 22. Paralledigm Shift!!! After a lot of frustrating trials that the clusterKnoppix idea was dropped. Parallel Knoppix(Upgraded to Pelican HPC) is chosen : ParallelKnoppix is a live CD image that let's you set up a high performance computing cluster in a few minutes. A Parallel cluster allows you to do parallel computing using MPI. Advantages: The frontend node (either a real computer or a virtual machine) boots from the CD image. The compute nodes boot by PXE, using the frontend node as the server. The LAM-MPI and OpenMPI implementations of MPI are installed. Contains extensive example programs . Very easy to add packages
  • 23. Didn't work immediately : PK needs LAN-booting support and our network cards didn't support it. We added “no acpi” and accidentally it worked.. ;) Etherboot is used : gPXE/Etherboot is an open source(GPL) network bootloader. It provides a direct replacement for proprietary PXE ROMs, with many extra features such as DNS, HTTP, iSCSI, etc . This solution, thus, gave us our first cluster.
  • 24. What the future holds…… More permanent solution instead of temporary solution. eg ROCKS, HADOOP, DISCO..... Implementing key parallel algorithms. Developing a guide for future cluster administrators.. (Who should be students.... :) ) Familiarizing other departments with the applications of cluster for their research.
  • 25.