From the course: Big Data Analytics with Hadoop and Apache Spark

Unlock the full course today

Join today to access over 24,800 courses taught by industry experts.

Using exercise files

Using exercise files

- [Instructor] Let's set up the required software for use in this course. We will use PySpark for our programming exercises using the built-in Spark instance that comes with PySpark. To install PySpark and other Python requirements, let's install Anaconda Navigator first. We can download Anaconda from this website shown here. Let's go to Anaconda Navigator. Here, let's create an environment called spark. Let's choose Python 3.11 for this course. The environment is ready now. Let's now install Jupyter Notebook in this environment. Please download the exercise files into a local folder. I have downloaded the exercise files in a folder called Exercise Files. Use the command prompt to navigate to the folder where you have downloaded the exercise files. On Windows, please use PowerShell for this purpose. PySpark has some environment dependencies that also need to be set up. First, it needs Java 17. I have already installed Java 17 on my system. We can verify this by using java -version. We…

Contents