shell脚本--使用sort、uniq、tr等命令统计文件内容行数，单词个数

最新推荐文章于 2025-02-11 08:52:51 发布

原创最新推荐文章于 2025-02-11 08:52:51 发布 · 1.8k 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#Linux #shell #统计行数 #sort、uniq、tr

linux&&shell 专栏收录该内容

37 篇文章

订阅专栏

本文介绍了如何利用shell脚本，结合sort、uniq和tr等命令，来统计文本文件的行数、相同行的数量以及特定单词和所有单词的出现次数，并提供了两种不同的操作方法。

假设有个文件hello.txt

[root@liuzhiwei-centos6 ~]# cat hello.txt 
hello world welcome
hello world
world welcome
hello welcome
hello world
hello world welcome
world world heihei
welcome hello
hello world
world heihei
hello welcome
world welcome

（1）sort、uniq统计文本行数、相同行数量

统计行数：
[root@liuzhiwei-centos6 ~]# cat hello.txt | wc -l
12

统计每个相同行的数量：
[root@liuzhiwei-centos6 ~]# cat hello.txt | sort | uniq -c
      2 hello welcome
      3 hello world
      2 hello world welcome
      1 welcome hello
      1 world heihei
      2 world welcome
      1 world world heihei

根据行数进行排序：
[root@liuzhiwei-centos6 ~]# cat hello.txt | sort | uniq -c | sort -n
      1 welcome hello
      1 world heihei
      1 world world heihei
      2 hello welcome
      2 hello world welcome
      2 world welcome
      3 hello world

（2）统计某个单词出现的次数

[root@liuzhiwei-centos6 ~]# cat hello.txt | grep -o "hello"
hello
hello
hello
hello
hello
hello
hello
hello
[root@liuzhiwei-centos6 ~]# cat hello.txt | grep -o "hello" | wc -l
8

或者，另一种做法：

思路：使用换行符替换空格得到所有的单词，然后过滤出hello。其中，tr表示替换
[root@liuzhiwei-centos6 ~]# cat hello.txt | tr ' ' '\n' | grep "hello"
hello
hello
hello
hello
hello
hello
hello
hello
[root@liuzhiwei-centos6 ~]# cat hello.txt | tr ' ' '\n' | grep "hello" | wc -l
8

（3）统计每个单词出现的次数并排序

[root@liuzhiwei-centos6 ~]# cat hello.txt | tr ' ' '\n' | sort | uniq -c | sort -n
      2 heihei
      7 welcome
      8 hello
     10 world