From the course: Big Data Analytics with Hadoop and Apache Spark

Unlock the full course today

Join today to access over 24,800 courses taught by industry experts.

Average score analytics

Average score analytics

- [Instructor] Now, let's compute the average scores for students across all subjects. To begin with, we will cache the total score dataframe so we don't have to read the entire data source and compute total scores again and again. Then, we find the average score for each student by executing an action. We group by student and compute the average for the total score. We display the results as well as the execution plan. Let's execute this code now. First, we see that the average total score shows up correctly for each student as desired by the use case. Then in the execution plan, we see that InMemoryTableScan is used on the two columns. This means that the cache is working and total scores are not getting computed again. In the next video, we will find the top student by each subject.

Contents