假设有个文件hello.txt
[root@liuzhiwei-centos6 ~]# cat hello.txt
hello world welcome
hello world
world welcome
hello welcome
hello world
hello world welcome
world world heihei
welcome hello
hello world
world heihei
hello welcome
world welcome
(1)sort、uniq统计文本行数、相同行数量
统计行数:
[root@liuzhiwei-centos6 ~]# cat hello.txt | wc -l
12
统计每个相同行的数量:
[root@liuzhiwei-centos6 ~]# cat hello.txt | sort | uniq -c
2 hello welcome
3 hello world
2 hello world welcome
1 welcome hello
1 world heihei
2 world welcome
1 world world heihei
根据行数进行排序:
[root@liuzhiwei-centos6 ~]# cat hello.txt | sort | uniq -c | sort -n
1 welcome hello
1 world heihei
1 world world heihei
2 hello welcome
2 hello world welcome
2 world welcome
3 hello world
(2)统计某个单词出现的次数
[root@liuzhiwei-centos6 ~]# cat hello.txt | grep -o "hello"
hello
hello
hello
hello
hello
hello
hello
hello
[root@liuzhiwei-centos6 ~]# cat hello.txt | grep -o "hello" | wc -l
8
或者,另一种做法:
思路:使用换行符替换空格得到所有的单词,然后过滤出hello。其中,tr表示替换
[root@liuzhiwei-centos6 ~]# cat hello.txt | tr ' ' '\n' | grep "hello"
hello
hello
hello
hello
hello
hello
hello
hello
[root@liuzhiwei-centos6 ~]# cat hello.txt | tr ' ' '\n' | grep "hello" | wc -l
8
(3)统计每个单词出现的次数并排序
[root@liuzhiwei-centos6 ~]# cat hello.txt | tr ' ' '\n' | sort | uniq -c | sort -n
2 heihei
7 welcome
8 hello
10 world