MPI - Distributed Computing made easy
Last Updated :
19 Sep, 2023
The Underlying Problem
To make things easier, let’s directly jump to some statistics:
- Facebook, currently, has 1.5 billion active monthly users.
- Google performs at least 1 trillion searches per year.
- About 48 hours of video are uploaded on Youtube every minute.
With such high demand, I do believe that a single system would be unable to handle the processing. Thus, comes the need for Distributed Systems.
What is Distributed Computing?
A distributed system consists of a collection of autonomous computers, connected through a network and distribution middleware, which enables computers to coordinate their activities and to share the resources of the system so that users perceive the system as a single, integrated computing facility.
Let us say about Google Web Server, from users perspective while they submit the searched query, they assume google web server as a single system. However, behind the curtain, Google has built a lot of servers which is distributed (geographically and computationally) to give us the result within a few seconds.
Advantages of Distributed Computing?
- Highly efficient
- Scalability
- Less tolerant of failures
- High Availability
Let us look at an example where we save computational time by using distributed computing.
For eg., If we have an array, a, having n elements, a=[1, 2, 3, 4, 5, 6]
We want to sum up all the elements of the array and output it. Now, let us assume that there are 1020 elements in the array and the time to compute the sum is x.
If we now divide the array in 3 parts, a1, a2 and a3 where a1 = { Set of elements where modulo(element from a) == 0 } a2 = { Set of elements where modulo(element from a) == 1 } a3 = { Set of elements where modulo(element from a) == 2 }
We will send these 3 arrays to 3 different processes for computing the sum of these individual processes. On average, let’s assume that each array has n/3 elements. Thus, the time taken by each process will also reduce to x/3. Since these processes will be running in parallel, the three “x/3” will be computed simultaneously and the sum of each array is returned to the main process. In the end, we can compute the final sum of a by summing up the individual sum of the arrays: a1, a2, and a3.
Thus, we are able to reduce the time from x to x/3, if we are running the process simultaneously. What is MPI?
Message Passing Interface (MPI) is a standardized and portable message-passing system developed for distributed and parallel computing. MPI provides parallel hardware vendors with a clearly defined base set of routines that can be efficiently implemented. As a result, hardware vendors can build upon this collection of standard low-level routines to create higher-level routines for the distributed-memory communication environment supplied with their parallel machines.
MPI gives users the flexibility of calling a set of routines from C, C++, Fortran, C#, Java, or Python. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation is in principle optimized for the hardware on which it runs)
The advantages of MPI over other message-passing framework is portability and speed. It has been implemented for almost every distributed memory architecture and each implementation is in principle optimized for the hardware on which it runs.
Even though there are options available for multiple languages, Python is the most preferred one due to its simplicity, and ease of writing the code. So, now, we will now look at how to install MPI on ubuntu 14.10.
Install MPI on Ubuntu
1) Step No. 1: Copy the following line of code in your terminal to install NumPy, a package for all scientific computing in python.
sudo apt-get install python-numpy
2) After successful completion of the above step, execute the following commands to update the system and install the pip package.
sudo apt-get update
sudo apt-get -y install python-pip
3) Now, we will download the doc for the latest version of the MPI.
sudo apt-get install libcr-dev mpich mpich-doc
4) Enter the command to download MPI using pip for python
sudo pip install mpi4py
MPI is successfully installed now. Sometimes, a problem might pop up while clearing up the packages after MPI has been installed due to the absence of dev tools in python. You can install them using the following command:
sudo apt-get install python-dev
MPI on Windows/MAC
For Windows/MAC user, they can visit the following link and download the .zip file and unzip and execute it:
Tutorials
Following installation, you can refer to the following documentation for using MPI using python.
References https://en.wikipedia.org/wiki/Message_Passing_Interface
About the Author: Anurag Mishra is currently a 3rd-year B.Tech student is an avid software follower and a full stack web developer. His keen interest lies in web development, NLP, and networking.
If you also wish to showcase your blog here, please see GBlog for guest blog writing on GeeksforGeeks.
Similar Reads
Distributed Computing System Models Distributed computing is a system where processing and data storage is distributed across multiple devices or systems, rather than handled by a single central device. In this article, we will see Distributed Computing System Models. Important Topics for Distributed Computing System Models Types of D
8 min read
Interprocess Communication in Distributed Systems Interprocess Communication (IPC) in distributed systems is crucial for enabling processes across different nodes to exchange data and coordinate activities. This article explores various IPC methods, their benefits, and challenges in modern distributed computing environments.Interprocess Communicati
7 min read
Mechanism for Building Distributed File System Building a Distributed File System (DFS) involves intricate mechanisms to manage data across multiple networked nodes. This article explores key strategies for designing scalable, fault-tolerant systems that optimize performance and ensure data integrity in distributed computing environments.Mechani
8 min read
Issues Related to Load Balancing in Distributed System This article explores critical challenges and considerations in load balancing within distributed systems. Addressing factors like workload variability, network constraints, scalability needs, and algorithmic complexities are essential for optimizing performance and resource utilization across distr
6 min read
Goals of Distributed System In this digital world, a distributed system is a network of interconnected computers that enhances user experience, resource utilization, scalability, reliability, and performance. This article will deal with the path to explore the goals of a distributed system with the help of different subtopics
3 min read
Types of Distributed System Pre-requisites: Distributed System A Distributed System is a Network of Machines that can exchange information with each other through Message-passing. It can be very useful as it helps in resource sharing. It enables computers to coordinate their activities and to share the resources of the system
8 min read