运维学习Day10——Linux进程、服务管理及监控系统负载

最新推荐文章于 2025-08-22 15:10:27 发布

曼波の小曲

最新推荐文章于 2025-08-22 15:10:27 发布

阅读量837

点赞数 14

CC 4.0 BY-SA版权

文章标签：运维学习 linux

本文链接：https://blog.csdn.net/zzhmyhx/article/details/149692857

12.Linux 进程管理（续)

给进程发信号

信号介绍

signal 是传递给进程的软中断。

生成信号的事件可以是错误，外部事件或者使用信号发送命令或键盘序列。

在这里插入图片描述

kill 命令

作用：给单个进程发信号。

# 环境准备
[zzh@centos7 ~ 10:01:08]$ vim output
#!/bin/bash
while true
do
  echo -n "$@ " >> output.log
  sleep 1
done
# 新开窗口动态监控文件output.log内容
[zzh@centos7 ~ 10:16:24]$ tail -f output.log


[zzh@centos7 ~ 10:18:35]$ ./output hello1 &
[1] 3861
[zzh@centos7 ~ 10:18:52]$ ./output hello2 &
[2] 3875


# 查看信号清单
[zzh@centos7 ~ 10:19:05]$ kill -l
 1) SIGHUP	 2) SIGINT	 3) SIGQUIT	 4) SIGILL	 5) SIGTRAP
 6) SIGABRT	 7) SIGBUS	 8) SIGFPE	 9) SIGKILL	10) SIGUSR1
11) SIGSEGV	12) SIGUSR2	13) SIGPIPE	14) SIGALRM	15) SIGTERM
16) SIGSTKFLT	17) SIGCHLD	18) SIGCONT	19) SIGSTOP	20) SIGTSTP
21) SIGTTIN	22) SIGTTOU	23) SIGURG	24) SIGXCPU	25) SIGXFSZ
26) SIGVTALRM	27) SIGPROF	28) SIGWINCH	29) SIGIO	30) SIGPWR
31) SIGSYS	34) SIGRTMIN	35) SIGRTMIN+1	36) SIGRTMIN+2	37) SIGRTMIN+3
38) SIGRTMIN+4	39) SIGRTMIN+5	40) SIGRTMIN+6	41) SIGRTMIN+7	42) SIGRTMIN+8
43) SIGRTMIN+9	44) SIGRTMIN+10	45) SIGRTMIN+11	46) SIGRTMIN+12	47) SIGRTMIN+13
48) SIGRTMIN+14	49) SIGRTMIN+15	50) SIGRTMAX-14	51) SIGRTMAX-13	52) SIGRTMAX-12
53) SIGRTMAX-11	54) SIGRTMAX-10	55) SIGRTMAX-9	56) SIGRTMAX-8	57) SIGRTMAX-7
58) SIGRTMAX-6	59) SIGRTMAX-5	60) SIGRTMAX-4	61) SIGRTMAX-3	62) SIGRTMAX-2
63) SIGRTMAX-1	64) SIGRTMAX	

# 给job id为1的进程发19信号，暂停运行
[zzh@centos7 ~ 10:20:22]$ kill -19 %1

[1]+  已停止               ./output hello1
[zzh@centos7 ~ 10:20:38]$ jobs
[1]+  已停止               ./output hello1
[2]-  运行中               ./output hello2 &

# 给job id为1的进程发18信号，继续运行
[laoma@centos7 ~]$ kill -18 %1
[laoma@centos7 ~]$ jobs
[1]+  运行中               ./output hello1 &
[2]-  运行中               ./output hello2 &

# 给job id为2的进程发SIGTERM信号，终止程序运行，该信号是默认信号
[zzh@centos7 ~ 10:20:59]$ kill -SIGTERM %2
[2]-  已终止               ./output hello2
[zzh@centos7 ~ 10:21:22]$ jobs
[1]+  已停止               ./output hello1

# SIGTERM 信号是默认信号
[zzh@centos7 ~ 10:21:47]$ kill %1

[1]+  已停止               ./output hello1
[zzh@centos7 ~ 10:22:17]$ jobs
[1]+  已终止               ./output hello1
[zzh@centos7 ~ 10:22:26]$ jobs

# 给PID是3389的进程发默认信号
[zzh@centos7 ~ 10:22:31]$ ./output hello3 &
[1] 4187
[zzh@centos7 ~ 10:22:58]$ ps axu|grep while
zzh        4204  0.0  0.0 112824   976 pts/1    S+   10:23   0:00 grep --color=auto while
[zzh@centos7 ~ 10:23:11]$ kill 4187
[zzh@centos7 ~ 10:24:58]$ jobs
[1]+  已终止               ./output hello3

pkill 和 pgrep 命令

作用：给多个进程发信号。

# 准备
[zzh@centos7 ~ 10:44:52]$ sleep 1231 &
[1] 4974
[zzh@centos7 ~ 10:45:03]$ sleep 1232 &
[2] 4976
[zzh@centos7 ~ 10:45:10]$ sleep 1233 &
[3] 4978

# 根据进程名查找进程
[zzh@centos7 ~ 10:51:37]$ pgrep sleep
4901
4902
4903
4904
[zzh@centos7 ~ 10:53:14]$ pgrep -l sleep
4901 sleep
4902 sleep
4903 sleep
4904 sleep
[laoma@centos7 ~]$ ps axu|grep sleep
[zzh@centos7 ~ 10:53:52]$ ps axu|grep sleep
zzh        4901  0.0  0.0 108052   360 ?        S    10:44   0:00 sleep 1232
zzh        4902  0.0  0.0 108052   360 ?        S    10:44   0:00 sleep 1232
zzh        4903  0.0  0.0 108052   356 ?        S    10:44   0:00 sleep 1232
zzh        4904  0.0  0.0 108052   360 ?        S    10:44   0:00 sleep 1233
root       5183  0.0  0.0 108052   360 ?        S    10:54   0:00 sleep 60
zzh        5185  0.0  0.0 112824   980 pts/0    S+   10:54   0:00 grep --color=auto sleep

# 给sleep相关进程发默认信号
[zzh@centos7 ~ 10:54:38]$ pkill sleep
pkill: killing pid 5193 failed: 不允许的操作
[1]   已终止               sleep 1231
[2]-  已终止               sleep 1232
[3]+  已终止               sleep 1233

[zzh@centos7 ~ 10:56:07]$ ps axu|grep sleep
root       5193  0.0  0.0 108052   356 ?        S    10:55   0:00 sleep 60
zzh        5202  0.0  0.0 112824   980 pts/0    S+   10:56   0:00 grep --color=auto sleep

# 根据用户名匹配程序
[zzh@centos7 ~ 10:56:22]$ ps -u zzh
   PID TTY          TIME CMD
  4925 ?        00:00:00 sshd
  4930 pts/0    00:00:00 bash
  5212 pts/0    00:00:00 ps
[laoma@centos7 ~]$ pgrep -u zzh

# kill相应用户所有进程，也就是注销用户
[zzh@centos7 ~ 10:57:40]$ pkill -u zzh
Connection closing...Socket close.

Connection closed by foreign host.

Disconnected from remote host(10.1.8.10:22) at 10:58:20.

Type `help' to learn how to use Xshell prompt.

# 根据终端匹配
[zzh@centos7 ~ 10:58:38]$ sleep 1231 &
[1] 5554
[zzh@centos7 ~ 10:58:50]$ sleep 1232 &
[2] 5555
[zzh@centos7 ~ 10:58:55]$ tty
/dev/pts/0
[zzh@centos7 ~ 10:59:00]$ pgrep -t pts/0 -l
5510 bash
5554 sleep
5555 sleep
[zzh@centos7 ~ 10:59:07]$ pkill -t pts/0
[1]-  已终止               sleep 1231
[2]+  已终止               sleep 1232

# 给bash发默认信号，bash进程屏蔽了
[zzh@centos7 ~ 10:59:12]$ pkill 5510

# 根据PPID，给子进程发信号
[zzh@centos7 ~ 11:01:52]$ sleep 1231 &
[1] 5642
[zzh@centos7 ~ 11:02:11]$ sleep 1232 &
[2] 5643
[zzh@centos7 ~ 11:02:17]$ ps jf
  PPID    PID   PGID    SID TTY       TPGID STAT   UID   TIME COMMAND
  5505   5510   5510   5510 pts/0      5644 Ss    1000   0:00 -bash
  5510   5642   5642   5510 pts/0      5644 S     1000   0:00  \_ sleep 1231
  5510   5643   5643   5510 pts/0      5644 S     1000   0:00  \_ sleep 1232
  5510   5644   5644   5510 pts/0      5644 R+    1000   0:00  \_ ps jf
[zzh@centos7 ~ 11:02:22]$ pkill -P 5510
[1]-  已终止               sleep 1231
[2]+  已终止               sleep 1232

如果pgrep无法过滤出具有特定特征的进程，可以使用ps和kill配合完成。

# 环境准备
# 暂停多个 dnf 安装进程

[zzh@centos7 ~ 10:38:55]$ ps -C sleep -f 
[zzh@centos7 ~ 10:39:24]$ ps -C sleep -f | grep 123
[zzh@centos7 ~ 10:39:47]$ ps -C sleep -f | grep 123 | awk '{print $2}'
[zzh@centos7 ~ 10:41:26]$ kill $(ps -C sleep -f | grep 123 | awk '{print $2}')

whoami who w last 命令

# 当前系统登录的用户
[zzh@centos7 ~ 11:05:02]$ whoami
zzh

# 当前系统登录的用户详细信息
[zzh@centos7 ~ 11:05:05]$ who
zzh      pts/0        2025-07-27 10:58 (10.1.8.1)

# 当前系统登录的用户详细信息
[zzh@centos7 ~ 11:13:43]$ w
 11:13:53 up  1:17,  2 users,  load average: 0.04, 0.03, 0.05
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
zzh      pts/0    10.1.8.1         10:58    1.00s  0.11s  0.04s w

# 系统中用户登录记录
[zzh@centos7 ~ 11:13:53]$ last
zzh      pts/0        10.1.8.1         Sun Jul 27 10:58   still logged in   
zzh      pts/1        10.1.8.1         Sun Jul 27 10:50 - 10:50  (00:00)    
zzh      pts/0        10.1.8.1         Sun Jul 27 10:44 - 10:58  (00:13)    
zzh      pts/4        10.1.8.1         Sun Jul 27 10:39 - 10:44  (00:04)    
zzh      pts/2        10.1.8.1         Sun Jul 27 10:29    gone - no logout 
zzh      pts/0        10.1.8.1         Sun Jul 27 10:28 - 10:39  (00:10)    
zzh      pts/1        10.1.8.1         Sun Jul 27 10:18 - 10:43  (00:24)    
zzh      pts/0        10.1.8.1         Sun Jul 27 09:59 - 10:18  (00:19)    
zzh      :0           :0               Sun Jul 27 09:58    gone - no logout 
reboot   system boot  3.10.0-1160.71.1 Sun Jul 27 09:56 - 11:14  (01:17)    
... ...  
reboot   system boot  3.10.0-1160.71.1 Tue Jul 22 15:23 - 15:42  (00:18)    
zzh      pts/0        10.1.8.1         Fri Jul 18 13:28 - 13:29  (00:00)    
zzh      pts/0        :0               Fri Jul 18 13:26 - 13:26  (00:00)    
zzh      :0           :0               Fri Jul 18 13:26 - down   (00:03)    
reboot   system boot  3.10.0-1160.71.1 Fri Jul 18 13:24 - 13:29  (00:04)    

wtmp begins Fri Jul 18 13:24:51 2025

案例：kill 不掉的进程

一个进程杀掉之后，又出现了。怎么让这个程序永久kill掉？

模拟：

# 准备程序
[zzh@centos7 ~ 11:33:33]$ mkdir bin
[zzh@centos7 ~ 11:34:25]$ vim bin/mm 
#!/bin/bash
while true
do
  md5sum /dev/zero
  sleep 1
done
[zzh@centos7 ~ 11:35:21]$ chmod +x bin/mm

# 执行程序
[zzh@centos7 ~ 11:35:54]$ nohup mm &

处理过程

# 查找进程
[zzh@centos7 ~ 11:37:53]$ ps axo pid,%cpu,command --sort -%cpu |head -n 5
    PID %CPU COMMAND
  73712 99.7 md5sum /dev/zero
   1150  0.1 /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P
   1786  0.1 /usr/libexec/packagekitd
  69280  0.1 /usr/bin/gnome-shell

# kill 进程，再次查看
[zzh@centos7 ~ 11:38:25]$ kill 73712
[zzh@centos7 ~ 11:38:55]$ ps axo pid,%cpu,command --sort -%cpu |head -n 5
    PID %CPU COMMAND
  74830 99.8 md5sum /dev/zero
   1150  0.1 /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P
   1786  0.1 /usr/libexec/packagekitd
  69280  0.1 /usr/bin/gnome-shell

# 思路一：把 md5sum 程序删除，使用 mv 模拟
[root@centos7 zzh 11:39:36]# mv /bin/md5sum ./md5sum

# 思路二：找到幕后推手，也就是找到父进程
[root@centos7 zzh 11:40:14]# ps ax -f |grep -e md5sum -e PPID
UID          PID    PPID  C STIME TTY      STAT   TIME CMD
laoma      75538   73556 99 15:25 ?        R      2:12 md5sum /dev/zero
root       76070   53920  0 15:28 pts/3    R+     0:00 grep --color=auto -e md5sum -e PPID
[root@centos7 zzh 11:40:45]# kill 73556
[root@centos7 zzh 11:41:32]# kill 75538

特殊进程

参考：出处

僵尸进程

僵尸进程介绍

如果一个进程退出了，立马X状态，作为父进程没有机会拿到子进程的退出结果。所以在Linux中，一般进程不会立即退出，而是要维持一个状态叫做Z，也叫做僵尸状态，方便后续父进程读取该子进程的退出结果。

僵尸状态会以终止状态保持在进程列表中，并且会一直等待父进程读取退出状态码。所以只要子进程退出，父进程还在运行，但是父进程没有读取子进程状态，子进程就进入Z状态。

僵尸进程模拟

可以使用实验来模拟僵尸状态：fork一个子进程，让子进程先退出，但是不要回收子进程。

写一段代码，让父进程运行60s，子进程运行10s，此时子进程先退出，父进程还在运行，同时父进程没有获取到子进程的退出码，子进程进入僵尸状态。

zombies.c 代码如下：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
  pid_t id = fork();
  //创建失败
  if(id<0)
  {
    perror("fork");                                                                             
    return 1;
  }
  // id >0 运行父进程 父进程运行30s
  else if(id>0)
  {
    //parent
    printf("parent[%d] is sleeping ...\n",getpid());
    sleep(60);

  }

  // id == 0 运行子进程， 子进程运行 5s 
  else {
    printf("child[%d] is begin Z ...\n",getpid());
    sleep(10);
    exit(EXIT_SUCCESS);
  }
  return 0;
}

在这里插入图片描述

[root@centos7 zzh 11:47:02]# vim zombies.c


[root@centos7 ~]# yum install -y gcc
[root@centos7 ~]# gcc zombies.c -o zombies
[root@centos7 ~]# chmod +x zombies
[root@centos7 ~]# ./zombies 
parent[1703] is sleeping ...
child[1704] is begin Z ...

# 新开终端，执行如下命令监控

[root@centos7 ~]# while true;do ps -C zombies u;sleep 1; echo;done
......
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       1703  0.0  0.0   4216   356 pts/1    S+   00:07   0:00 ./zombies
root       1704  0.0  0.0   4216    88 pts/1    S+   00:07   0:00 ./zombies

USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       1703  0.0  0.0   4216   356 pts/1    S+   00:07   0:00 ./zombies
root       1704  0.0  0.0   4216    88 pts/1    S+   00:07   0:00 ./zombies

USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       1703  0.0  0.0   4216   356 pts/1    S+   00:07   0:00 ./zombies
root       1704  0.0  0.0      0     0 pts/1    Z+   00:07   0:00 [zombies] <defunct>

USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       1703  0.0  0.0   4216   356 pts/1    S+   00:07   0:00 ./zombies
root       1704  0.0  0.0      0     0 pts/1    Z+   00:07   0:00 [zombies] <defunct>

......

可以看到子进程退出后，释放内存资源，状态变为 Zombies。

僵尸进程危害

进程的退出状态必须被维持下去，因为它要告诉它的父进程，你交给我的任务，我办的怎么样了，可是父进程一直不读取，那么进程就处于Z状态。

维护退出状态本身就是使用数据维护，属于进程的基本信息，所以要保存在tast_struct（PCB）中，Z状态一直不退出，PCB就需要一直维护。

那么，一个父进程创建了很多子进程，但是不回收，就会造成资源的浪费，因为数据结构对象本身就要占用内存，就比如C语言中定义一个结构体变量，就需要在内存的某个位置进行开辟空间。

太多的僵尸进程会造成PID资源浪费，无法创建新的进程，因为一个操作系统的进程总数是有上限。

孤儿进程

孤儿进程介绍

父进程如果提前退出，子进程后退出，子进程就称为孤儿进程。子进程退出后处于Z状态，系统如何处理？

此时，子进程被1号进程systemd领养，由systemd回收。

孤儿进程模拟

写一段代码，让子进程运行30s,父进程运行3s，父进程先退出，子进程由1号进程收养。

lonely.c 代码如下：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
  pid_t id = fork();
  if(id<0)
  {
    perror("fork");
    return 1;
  }
  else if(id == 0)
  {
    // parent
    // printf("parent[%d] is sleeping ...\n",getpid());
    printf("I am child, pid:%d\n",getpid());
    sleep(60);
  }

  else {
    // parent 
    printf("I am parent,pid:%d\n",getpid());
    sleep(3);
  }
  return 0;
}

[root@centos7 ~]# gcc lonely.c -o lonely
[root@centos7 ~]# chmod +x lonely
[root@centos7 ~]# ./lonely 
I am parent,pid:1882
I am child, pid:1883

# 新开终端，执行如下命令监控
[root@centos7 ~]# while true;do ps -fC lonely;sleep 1; echo;done
......
UID         PID   PPID  C STIME TTY          TIME CMD
root       1882   1223  0 00:14 pts/1    00:00:00 ./lonely
root       1883   1882  0 00:14 pts/1    00:00:00 ./lonely

UID         PID   PPID  C STIME TTY          TIME CMD
root       1883      1  0 00:14 pts/1    00:00:00 ./lonely
....

可以看到父进程退出后，子进程的父进程的PID有原先的1882变为1了。

孤儿进程危害

**虽然孤儿进程由systemd直接管理了，但如果仍然不停产生新的孤儿进程则会导致占用过多系统资源。**需要开发人员检查代码，避免这个问题。如果孤儿进程没有实际意义，则可以通过kill或pkill终止。

13. 监控系统负载

系统负载介绍

系统负载平均值：Linux内核以活动请求数的指数移动平均值来表示。

活动请求数不仅包含运行中进程，还包含等待IO的进程，对应于R和D。等待IO包括处于睡眠等待预期磁盘和网络响应的任务。
指数移动平均值是一个数学公式，可以平滑趋势数据的高值和低值，更加准确地表示一段时间内系统负载，并确定系统负载是随着时间增加还是减少。
根据所有CPU活动请求数，每5秒计算一次Load Average。通过汇总这些值，可以得到最近1分钟，5分钟和15分钟内的指数移动平均值。
一些UNIX系统仅考虑CPU使用率或运行队列长度。Linux中负载平均值中还包含了对IO的考量，遇到负载平均值很高但CPU活动很低时，检查磁盘和网络活动。
Linux将各个物理CPU核心和微处理器超线程计为独立执行单元。每个独立的执行单元拥有独立的请求队列。

查看系统负载

# 查看CPU
[root@centos7 zzh 11:50:16]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    1
座：                 2
NUMA 节点：         1
厂商 ID：           AuthenticAMD
CPU 系列：          25
型号：              68
型号名称：        AMD Ryzen 7 6800H with Radeon Graphics
步进：              1
CPU MHz：             3193.893
BogoMIPS：            6387.78
超管理器厂商：  VMware
虚拟化类型：     完全
L1d 缓存：          32K
L1i 缓存：          32K
L2 缓存：           512K
L3 缓存：           16384K
NUMA 节点0 CPU：    0,1
......

# 查看负载
[root@centos7 zzh 14:06:22]# uptime
 14:06:46 up  2:34,  2 users,  load average: 0.00, 0.01, 0.05

# 给系统加负载
[root@centos7 zzh 14:06:46]# md5sum /dev/zero &
[1] 7897
[root@centos7 zzh 14:07:16]# md5sum /dev/zero &
[1] 7903

# 等30秒左右
[root@centos7 zzh 14:07:34]# uptime
 14:07:38 up  2:35,  2 users,  load average: 0.00, 0.01, 0.05

负载解读

示例：4核心的CPU

负载为： 2.92 4.48 5.20
每个cpu负载为：0.73(2.92/4) 1.12(4.48/4) 1.30(5.20/4)

比较理想的值为 75% 左右。

1.CPU：使用百分比，top
%Cpu(s): 96.7 us, 3.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
维持在70-90之间，不是完全保和，预留一些计算资源。

1 5 15 分钟
系统负载 load average: 3.96, 1.94, 0.78
CPU 数量只有 2 个: 1.98 0.97 0.39

CPU 计算速度太快，假设把CPU的1秒的算力拆分成100份，0.01秒的算其实还是很快。
cpu不停地在多个程序之间进行切换。

2.内存：容量 top，free
KiB Mem : 4026124 total, 3194448 free, 483708 used, 347968 buff/cache

3.磁盘：容量，当前的IOPS（每秒input和output，带宽）
容量： df -h、sar -dp 读写速度
模拟硬盘写
[root@centos7 ~ 14:02:18]# while true;do dd if=/dev/zero of=/root/bigfile bs=1M count=2048;sleep 1;done
监控
[root@centos7 ~ 14:02:46]# sar -dp 1

4.网络：带宽

top 命令

作用：动态查看进程信息，包括不同状态任务数量，CPU消耗和内存消耗。

在这里插入图片描述

%Cpu列解读：

us，用户进程消耗CPU的时间
sy，系统进程消耗CPU的时间
wa，CPU用于等待IO进程的时间

top命令快捷键

常用的命令：数字1，P，M，k，q，h。

在这里插入图片描述

stress 工具

Linux 中的 stress 工具用于对系统进行压力测试，可模拟 CPU、内存、I/O 和磁盘等资源的高负载状态。通过指定参数（如 -c 压 CPU、-m 压内存）可创建负载，帮助发现系统在压力下的稳定性问题，常用于性能调优或硬件验证。

# 安装
[root@centos7 zzh 14:12:23]# yum install -y stress

# 帮助信息
[root@centos7 zzh 14:16:25]# stress --help
`stress' imposes certain types of compute stress on your system

Usage: stress [OPTION [ARG]] ...
 -?, --help         show this help statement
     --version      show version statement
 -v, --verbose      be verbose
 -q, --quiet        be quiet
 -n, --dry-run      show what would have been done
 -t, --timeout N    timeout after N seconds
     --backoff N    wait factor of N microseconds before work starts
 -c, --cpu N        spawn N workers spinning on sqrt()
 -i, --io N         spawn N workers spinning on sync()
 -m, --vm N         spawn N workers spinning on malloc()/free()
     --vm-bytes B   malloc B bytes per vm worker (default is 256MB)
     --vm-stride B  touch a byte every B bytes (default is 4096)
     --vm-hang N    sleep N secs before free (default none, 0 is inf)
     --vm-keep      redirty memory instead of freeing and reallocating
 -d, --hdd N        spawn N workers spinning on write()/unlink()
     --hdd-bytes B  write B bytes per hdd worker (default is 1GB)

Example: stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --timeout 10s

Note: Numbers may be suffixed with s,m,h,d,y (time) or B,K,M,G (size).

压力测试-CPU

# 消耗2个CPU
[root@centos7 zzh 14:36:16]# stress -c 2
stress: info: [8296] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd

# top 监控
top - 14:18:22 up 47 min,  2 users,  load average: 1.37, 0.50, 0.47
Tasks: 187 total,   4 running, 183 sleeping,   0 stopped,   0 zombie
%Cpu(s):100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  4026124 total,  1535676 free,   490668 used,  1999780 buff/cache
KiB Swap:  4063228 total,  4063228 free,        0 used.  3277584 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND       
  2596 root      20   0    7312    100      0 R 100.0  0.0   1:01.42 stress         
  2597 root      20   0    7312    100      0 R 100.0  0.0   1:01.25 stress         
  2594 root      20   0  162100   2320   1588 R   0.6  0.1   0:00.12 top  
......

压力测试-内存

# 消耗前内存
[root@centos7 zzh 14:39:07]# free -m
              total        used        free      shared  buff/cache   available
Mem:           1980         773        1005          14         200        1031
Swap:          2047           3        2044

# 消耗 1G 内存
[root@centos7 zzh 14:39:11]# stress -m 1 --vm-bytes 1G
stress: info: [8578] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd

# 消耗后内存
[root@centos7 zzh 14:39:58]# free -m
              total        used        free      shared  buff/cache   available
Mem:           1980         775        1065          11         139        1056
Swap:          2047           7        2040

压力测试-磁盘

# 消耗磁盘IO
[root@centos7 zzh 14:40:02]# stress -d 1 --hdd-bytes 2G
stress: info: [8718] dispatching hogs: 0 cpu, 0 io, 0 vm, 1 hdd

# 监视活动状态百分比，%util
[root@centos7 zzh 14:42:45]# sar -dp 1
......
14时43分32秒       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
14时43分33秒       sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
14时43分33秒       sr0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
14时43分33秒 centos-root      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
14时43分33秒 centos-swap      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
14时43分33秒 centos-home      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

14时43分33秒       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
14时43分34秒       sda      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
14时43分34秒       sr0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
14时43分34秒 centos-root      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
14时43分34秒 centos-swap      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
14时43分34秒 centos-home      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
......

网络测试

# 传送一个大size的文件
[root@centos7 zzh 14:44:39]# wget http://192.168.43.249/%E7%B3%BB%E7%BB%9F%E9%95%9C%E5%83%8F/CentOS-7-x86_64-DVD-2207-02.iso

# 监控带宽
[root@centos7 zzh 14:45:49]# sar -n DEV 1
Linux 3.10.0-1160.71.1.el7.x86_64 (centos7.zzh.cloud2) 	2025年07月27日 	_x86_64_	(2 CPU)

14时46分13秒     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
14时46分14秒        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
14时46分14秒 virbr0-nic      0.00      0.00      0.00      0.00      0.00      0.00      0.00
14时46分14秒    virbr0      0.00      0.00      0.00      0.00      0.00      0.00      0.00
14时46分14秒     ens33      1.00      1.00      0.06      0.18      0.00      0.00      0.00
... ...

14. Linux 服务管理

systemd 介绍

系统启动管理进程

CentOS 5 使用 Sys init 引导系统启动，启动速度最慢，采用串行方式启动，无论进程相互之间有无依赖关系。
CentOS 6 使用 Upstart init 引导系统启动，启动速度快一点，有依赖的进程之间依次启动而其他与之没有依赖关系的则并行同步启动。
CentOS 7 使用 Systemd 引导系统启动，速度最快，所有进程无论有无依赖关系则都是并行启动（很多时候进程没有真正启动而是只有一个信号或者说是标记而已，在真正利用的时候才会真正启动）。Systemd为了解决上文的问题而诞生。它的目标是，为系统的启动和管理提供一套完整的解决方案。

基本概念

服务：从业务角度来称呼，例如 web 服务，数据库服务。

守护进程（daemon）：web 服务器对外提供 web 服务，由 web 相关的进程提供支持。

例如：

# 安装软件包
[root@centos7 zzh 14:46:42]# yum install -y httpd

# 启动服务
[root@centos7 zzh 15:07:15]# systemctl start httpd

# 停止 firewalld 服务
[root@centos7 zzh 15:02:11]# systemctl stop firewalld.service

# 查看进程
[root@centos7 zzh 15:07:25]# ps axf|tail
  6846 ?        Sl     0:00 /usr/libexec/evolution-addressbook-factory
  6883 ?        Sl     0:00  \_ /usr/libexec/evolution-addressbook-factory-subprocess --factory all --bus-name org.gnome.evolution.dataserver.Subprocess.Backend.AddressBookx6846x2 --own-path /org/gnome/evolution/dataserver/Subprocess/Backend/AddressBook/6846/2
  6987 ?        Sl     0:00 /usr/libexec/gvfsd-metadata
  7054 ?        Sl     0:01 /usr/bin/nautilus --gapplication-service
  9281 ?        Ss     0:00 /usr/sbin/httpd -DFOREGROUND
  9286 ?        S      0:00  \_ /usr/sbin/httpd -DFOREGROUND
  9287 ?        S      0:00  \_ /usr/sbin/httpd -DFOREGROUND
  9288 ?        S      0:00  \_ /usr/sbin/httpd -DFOREGROUND
  9289 ?        S      0:00  \_ /usr/sbin/httpd -DFOREGROUND
  9290 ?        S      0:00  \_ /usr/sbin/httpd -DFOREGROUND

httpd 服务对应的守护进程是87094、87096…

systemd 架构

在这里插入图片描述

unit 类型

systemctl 命令用于管理不同类型的系统对象，这些对象称之为 units。

Service unit：用于定义系统服务，文件扩展名为**.service**，例如httpd.service
Socket unit：用于标识进程间通信用的 socket文件，文件扩展名为.socket
Target unit：用于模拟实现“运行级别”，文件扩展名为.target
Timer unit：用于管理计划任务，文件扩展名为.timer
Device unit：用于定义内核识别的设备，文件扩展名为.device
Mount unit：用于定义文件系统挂载点，文件扩展名为.mount
Snapshot unit：管理系统快照，文件扩展名为.snapshot
Swap unit：用于标识swap设备，文件扩展名为.swap
Automount unit：文件系统的自动挂载点，文件扩展名为.automount
Path unit：用于根据文件系统上特定对象的变化来启动其他服务，文件扩展名为.path
Slice unit：用于资源管理，文件扩展名为.slice

查看 unit 列表信息

# 列出状态为loaded units
[root@centos7 zzh 15:07:30]# systemctl list-units
  UNIT                                                LOAD   ACTIVE SUB       DESCRIPTION
  proc-sys-fs-binfmt_misc.automount                   loaded active waiting   Arbitrary Executable File Formats File System Automo
  sys-devices-pci0000:00-0000:00:07.1-ata2-host2-target2:0:0-2:0:0:0-block-sr0.device loaded active plugged   VMware_Virtual_IDE_C
  sys-devices-pci0000:00-0000:00:10.0-host0-target0:0:0-0:0:0:0-block-sda-sda1.device loaded active plugged   VMware_Virtual_S 1
... ...

systemctl list-units命令输出说明：

UNIT：服务单元名称。
LOAD：systemd是否正确解析了单元的配置并将该单元加载到内存中。
ACTIVE：单元的高级别激活状态。此信息表明单元是否已成功启动。
SUB：单元的低级别激活状态。此信息指示有关该单元的更多详细信息。信息视单元类型、状态以及单元的执行方式而异。
DESCRIPTION：单元的简短描述。

# -t选项查看特定类型unit 清单
[zzh@centos7 ~ 15:23:04]$ systemctl list-units -t timer
UNIT                         LOAD   ACTIVE SUB     DESCRIPTION
systemd-tmpfiles-clean.timer loaded active waiting Daily Cleanup of Temporary Directories
unbound-anchor.timer         loaded active waiting daily update of the root trust anchor for DNSSEC

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

2 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

# 列出类型为service，状态为active和inactive unit
[root@centos7 zzh 15:24:04]# systemctl list-units --type service --all
  UNIT                                                LOAD      ACTIVE   SUB     DESCRIPTION
  abrt-ccpp.service                                   loaded    active   exited  Install ABRT coredump hook
... ...

# 列出所有unit，包括未loaded的unit
[root@centos7 zzh 15:24:53]# systemctl list-unit-files

# 查看失败的服务
[root@centos7 zzh 15:25:09]# systemctl --failed --type service

查看单个 unit 信息

[zzh@centos7 ~ 15:25:36]$ systemctl status sshd.service
● sshd.service - OpenSSH server daemon
   Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
   Active: active (running) since 日 2025-07-27 09:56:36 CST; 5h 29min ago
     Docs: man:sshd(8)
           man:sshd_config(5)
 Main PID: 1240 (sshd)
    Tasks: 1
   CGroup: /system.slice/sshd.service
           └─1240 /usr/sbin/sshd -D

在这里插入图片描述

控制系统服务

systemctl 命令

在这里插入图片描述

# 停止服务
[root@centos7 zzh 15:41:34]# systemctl stop sshd.service
# 客户端连接测试 

# 启动服务
[root@centos7 zzh 15:41:34]# systemctl stop sshd.service
# 客户端连接测试 

# 重启服务，相当于stop再start
[root@centos7 zzh 15:42:18]# systemctl restart sshd.service
# 重新加载服务，服务对应的主进程不会重启，只会重新加载一次配置文件。

# 一般用于配置文件变动后，重新加载
[root@centos7 zzh 15:42:35]# systemctl reload sshd.service

# 禁止服务开机自启
[root@centos7 zzh 15:42:51]# systemctl disable sshd.service
[root@centos7 zzh 15:43:07]# is-enabled sshd
disabled

# 设置服务开机自启
[root@centos7 zzh 15:43:14]# systemctl enable sshd.service
[root@centos7 zzh 15:46:01]# systemctl is-enabled sshd
enabled

# 禁用服务，服务被禁用后，将无法start，因为服务的配置文件指向/dev/null
[root@centos7 zzh 15:46:12]# systemctl mask sshd.service
# 取消禁用
[root@centos7 zzh 15:46:28]# systemctl unmask sshd.service

#配置
[root@centos7 zzh 15:28:44]# vim /etc/ssh/sshd_config

在这里插入图片描述

unit 配置文件

/etc/systemd/system/ sshd.service，优先生效。一般是管理员自定义的配置。
/usr/lib/systemd/system/ sshd.service，其次生效。软件包自带的默认配置。

开发一个 mm 服务

[root@centos7 zzh 16:07:06]# cp /bin/md5sum /bin/mm
[root@centos7 zzh 16:07:28]# cp /usr/lib/systemd/system/sshd.service /etc/systemd/system/mm.service
[root@centos7 zzh 16:08:27]# vim /etc/systemd/system/mm.service
[Unit]
Description=mm server daemon

[Service]
Type=fork
ExecStart=/usr/bin/mm /dev/zero

[Install]
WantedBy=multi-user.target

[root@centos7 zzh 16:09:04]# systemctl daemon-reload
[root@centos7 zzh 16:09:45]# systemctl enable mm --now
[root@centos7 zzh 16:09:57]# systemctl status mm

运维学习Day10——Linux进程、服务管理及监控系统负载

文章目录

12.Linux 进程管理（续)

给进程发信号

信号介绍

kill 命令

pkill 和 pgrep 命令

whoami who w last 命令

案例：kill 不掉的进程

特殊进程

僵尸进程

僵尸进程介绍

僵尸进程模拟

僵尸进程危害

孤儿进程

孤儿进程介绍

孤儿进程模拟

孤儿进程危害

13. 监控系统负载

系统负载介绍

查看系统负载

负载解读

top 命令

stress 工具

压力测试-CPU

压力测试-内存

压力测试-磁盘

网络测试

14. Linux 服务管理

systemd 介绍

系统启动管理进程

基本概念

systemd 架构

unit 类型

查看 unit 列表信息

查看单个 unit 信息

控制系统服务

systemctl 命令

unit 配置文件

开发一个 mm 服务