From the course: Big Data Analytics with Hadoop and Apache Spark
Unlock the full course today
Join today to access over 24,800 courses taught by industry experts.
Average score analytics
From the course: Big Data Analytics with Hadoop and Apache Spark
Average score analytics
- [Instructor] Now, let's compute the average scores for students across all subjects. To begin with, we will cache the total score dataframe so we don't have to read the entire data source and compute total scores again and again. Then, we find the average score for each student by executing an action. We group by student and compute the average for the total score. We display the results as well as the execution plan. Let's execute this code now. First, we see that the average total score shows up correctly for each student as desired by the use case. Then in the execution plan, we see that InMemoryTableScan is used on the two columns. This means that the cache is working and total scores are not getting computed again. In the next video, we will find the top student by each subject.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.