hadoop@liuyang-VMware-Virtual-Platform:~$ cd /usr/local/hadoop
./sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [liuyang-VMware-Virtual-Platform]
hadoop@liuyang-VMware-Virtual-Platform:/usr/local/hadoop$ jps
16421 NameNode
16572 DataNode
16829 SecondaryNameNode
16959 Jps
hadoop@liuyang-VMware-Virtual-Platform:/usr/local/hadoop$ hdfs dfs -ls /user/hadoop/movie_data/
Found 1 items
-rw-r--r-- 1 hadoop supergroup 435 2025-05-10 23:39 /user/hadoop/movie_data/movie_ratings.csv
hadoop@liuyang-VMware-Virtual-Platform:/usr/local/hadoop$ hdfs dfs -cat /user/hadoop/movie_data/movie_ratings.csv
user_id,movie_id,rating,timestamp
1,101,4.0,964982703
1,102,3.0,964981247
2,101,5.0,964982224
2,103,3.5,964984815
3,101,4.5,964985076
3,102,2.5,964986123
3,103,4.0,964987456
4,101,3.0,964988789
4,104,5.0,964989012
5,102,4.5,964990345
5,103,3.0,964991678
5,105,4.0,965002911
6,101,2.0,964993244
6,104,3.5,964994577
7,103,5.0,964995910
7,105,4.5,964996243
8,102,3.0,964997576
8,104,4.0,964998909
9,101,4.5,964999242
10,105,3.5,965000575一、HDFS操作
1.创建实验目录
2.上传本地数据到HDFS
3.检查文件
二、Hive操作
1.创建数据库movie_db
2.创建评分表scores
3.将HDFS文件里的数据导入到评分表中
三、简单数据分析
1.查看数据
2.基本统计
(1)查询总记录数
(2)查询每个用户的平均评分
(3)查询每个电影的平均评分
(4)统计评分次数最多的3部电影
(5)查询最活跃的5个用户
3.查看评分分布情况
结果截图
最新发布