From the course: Big Data Analytics with Hadoop and Apache Spark
Unlock the full course today
Join today to access over 24,800 courses taught by industry experts.
Using exercise files
From the course: Big Data Analytics with Hadoop and Apache Spark
Using exercise files
- [Instructor] Let's set up the required software for use in this course. We will use PySpark for our programming exercises using the built-in Spark instance that comes with PySpark. To install PySpark and other Python requirements, let's install Anaconda Navigator first. We can download Anaconda from this website shown here. Let's go to Anaconda Navigator. Here, let's create an environment called spark. Let's choose Python 3.11 for this course. The environment is ready now. Let's now install Jupyter Notebook in this environment. Please download the exercise files into a local folder. I have downloaded the exercise files in a folder called Exercise Files. Use the command prompt to navigate to the folder where you have downloaded the exercise files. On Windows, please use PowerShell for this purpose. PySpark has some environment dependencies that also need to be set up. First, it needs Java 17. I have already installed Java 17 on my system. We can verify this by using java -version. We…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.