前言
问题衍生自 让内核 hang 住
这里主要是看 module 中构造 NPE
然后 分析一下 这里整个输出的日志相关信息
测试模块
root@ubuntu:~/linux/module/kernelNpe# cat Test08KernelNpe.c
#include "linux/kernel.h"
#include "linux/init.h"
#include "linux/module.h"
#include "linux/sched.h"
#include "linux/fdtable.h"
#include "linux/fs_struct.h"
#include "linux/mm_types.h"
#include "linux/init_task.h"
#include "linux/types.h"
#include "linux/atomic.h"
MODULE_LICENSE("GPL");
static int __init
print_pcb(void)
{
printk("module_init ...\n");
char* ptr = NULL;
memcpy(ptr, "abc", 10);
printk("module_init after ...\n");
return 0;
}
static void __exit
exit_pcb(void)
{
printk("module_exit ...\n");
}
module_init(print_pcb);
module_exit(exit_pcb);
在 qemu 启动的虚拟机中 执行情况如下
我们这里来完整的看一下 这里的所有的日志输出的地方
“module_init …” 的输出来自于 Test08KernelNpe.print_pcb 中的输出
最初始的触发是 SegmentFault, 这里 access 了一个不存在的地址, 这里可以参见上面的 page_fault 的相关局部变量, 这里不多展示
这里可以参考 关于 SegmentFault 的一些场景(1)的 访问空指针 部分
以下三行的输出主要是来自于如下代码, 对应于上面主题流程是 show_fault_oops 函数的调用
[ 66.880339] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 66.880552] IP: print_pcb+0x17/0x1000 [Test08KernelNpe]
[ 66.880552] PGD 0
然后 如下日志是来自于 __die 函数的调用
[ 66.880552] Oops: 0002 [#1] SMP
[ 66.880552] Modules linked in: Test08KernelNpe(OE+)
[ 66.880552] CPU: 0 PID: 258 Comm: insmod Tainted: G OE 4.10.14 #1
[ 66.880552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 66.880552] task: ffff88007fbb3500 task.stack: ffffc90000660000
[ 66.880552] RIP: 0010:print_pcb+0x17/0x1000 [Test08KernelNpe]
[ 66.880552] RSP: 0018:ffffc90000663db8 EFLAGS: 00000286
[ 66.880552] RAX: 75646f6d00636261 RBX: ffffffffa0005000 RCX: 0000000000000000
[ 66.880552] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246
[ 66.880552] RBP: ffffc90000663db8 R08: 000000000000001f R09: 2e2e2e2074696e69
[ 66.880552] R10: ffffea0001fea480 R11: 0a2e2e2e2074696e R12: ffffffffa0002000
[ 66.880552] R13: 0000000000000000 R14: 00005609ef97f26b R15: 0000000000000000
[ 66.880552] FS: 00007f4abe3e0700(0000) GS:ffff88007da00000(0000) knlGS:0000000000000000
[ 66.880552] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 66.880552] CR2: 00005575f8490018 CR3: 000000007fbaf000 CR4: 00000000000006f0
[ 66.880552] Call Trace:
[ 66.880552] do_one_initcall+0x59/0x13b
[ 66.880552] ? kmem_cache_alloc_trace+0xd3/0x171
[ 66.880552] ? do_init_module+0x27/0x1fd
[ 66.880552] do_init_module+0x5f/0x1fd
[ 66.880552] load_module+0x2d3/0x406
[ 66.880552] ? copy_module_from_user+0x8d/0x8d
[ 66.880552] SYSC_finit_module+0xc8/0xe9
[ 66.880552] SyS_finit_module+0xe/0x10
[ 66.880552] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 66.880552] RIP: 0033:0x7f4abdef55d9
[ 66.880552] RSP: 002b:00007ffe0bcf00f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
[ 66.880552] RAX: ffffffffffffffda RBX: 00007f4abe1b8b20 RCX: 00007f4abdef55d9
[ 66.880552] RDX: 0000000000000000 RSI: 00005609ef97f26b RDI: 0000000000000003
[ 66.880552] RBP: 0000000000001021 R08: 0000000000000000 R09: 00007f4abe1baea0
[ 66.880552] R10: 0000000000000003 R11: 0000000000000202 R12: 000000000000270e
[ 66.880552] R13: 00007f4abe1b8b78 R14: 00007f4abe1b8b78 R15: 00007f4abe1b8b78
[ 66.880552] Code: <48> 89 04 25 00 00 00 00 66 8b 05 17 c0 ff ff 66 89 04 25 08 00 00
[ 66.880552] RIP: print_pcb+0x17/0x1000 [Test08KernelNpe] RSP: ffffc90000663db8
如下输出主要是来自于 print_modules
[ 66.880552] Modules linked in: Test08KernelNpe(OE+)
如下输出主要是来自于 show_regs
[ 66.880552] CPU: 0 PID: 258 Comm: insmod Tainted: G OE 4.10.14 #1
[ 66.880552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 66.880552] task: ffff88007fbb3500 task.stack: ffffc90000660000
[ 66.880552] RIP: 0010:print_pcb+0x17/0x1000 [Test08KernelNpe]
[ 66.880552] RSP: 0018:ffffc90000663db8 EFLAGS: 00000286
[ 66.880552] RAX: 75646f6d00636261 RBX: ffffffffa0005000 RCX: 0000000000000000
[ 66.880552] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246
[ 66.880552] RBP: ffffc90000663db8 R08: 000000000000001f R09: 2e2e2e2074696e69
[ 66.880552] R10: ffffea0001fea480 R11: 0a2e2e2e2074696e R12: ffffffffa0002000
[ 66.880552] R13: 0000000000000000 R14: 00005609ef97f26b R15: 0000000000000000
[ 66.880552] FS: 00007f4abe3e0700(0000) GS:ffff88007da00000(0000) knlGS:0000000000000000
[ 66.880552] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 66.880552] CR2: 00005575f8490018 CR3: 000000007fbaf000 CR4: 00000000000006f0
[ 66.880552] Call Trace:
[ 66.880552] do_one_initcall+0x59/0x13b
[ 66.880552] ? kmem_cache_alloc_trace+0xd3/0x171
[ 66.880552] ? do_init_module+0x27/0x1fd
[ 66.880552] do_init_module+0x5f/0x1fd
[ 66.880552] load_module+0x2d3/0x406
[ 66.880552] ? copy_module_from_user+0x8d/0x8d
[ 66.880552] SYSC_finit_module+0xc8/0xe9
[ 66.880552] SyS_finit_module+0xe/0x10
[ 66.880552] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 66.880552] RIP: 0033:0x7f4abdef55d9
[ 66.880552] RSP: 002b:00007ffe0bcf00f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
[ 66.880552] RAX: ffffffffffffffda RBX: 00007f4abe1b8b20 RCX: 00007f4abdef55d9
[ 66.880552] RDX: 0000000000000000 RSI: 00005609ef97f26b RDI: 0000000000000003
[ 66.880552] RBP: 0000000000001021 R08: 0000000000000000 R09: 00007f4abe1baea0
[ 66.880552] R10: 0000000000000003 R11: 0000000000000202 R12: 000000000000270e
[ 66.880552] R13: 00007f4abe1b8b78 R14: 00007f4abe1b8b78 R15: 00007f4abe1b8b78
[ 66.880552] Code: <48> 89 04 25 00 00 00 00 66 8b 05 17 c0 ff ff 66 89 04 25 08 00 00
如下内容的输出主要是来自于 __show_regs
[ 66.880552] RIP: 0010:print_pcb+0x17/0x1000 [Test08KernelNpe]
[ 66.880552] RSP: 0018:ffffc90000663db8 EFLAGS: 00000286
[ 66.880552] RAX: 75646f6d00636261 RBX: ffffffffa0005000 RCX: 0000000000000000
[ 66.880552] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246
[ 66.880552] RBP: ffffc90000663db8 R08: 000000000000001f R09: 2e2e2e2074696e69
[ 66.880552] R10: ffffea0001fea480 R11: 0a2e2e2e2074696e R12: ffffffffa0002000
[ 66.880552] R13: 0000000000000000 R14: 00005609ef97f26b R15: 0000000000000000
[ 66.880552] FS: 00007f4abe3e0700(0000) GS:ffff88007da00000(0000) knlGS:0000000000000000
[ 66.880552] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 66.880552] CR2: 00005575f8490018 CR3: 000000007fbaf000 CR4: 00000000000006f0
如下输出主要是来自于 show_trace_log_lvl
[ 66.880552] Call Trace:
[ 66.880552] do_one_initcall+0x59/0x13b
[ 66.880552] ? kmem_cache_alloc_trace+0xd3/0x171
[ 66.880552] ? do_init_module+0x27/0x1fd
[ 66.880552] do_init_module+0x5f/0x1fd
[ 66.880552] load_module+0x2d3/0x406
[ 66.880552] ? copy_module_from_user+0x8d/0x8d
[ 66.880552] SYSC_finit_module+0xc8/0xe9
[ 66.880552] SyS_finit_module+0xe/0x10
[ 66.880552] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 66.880552] RIP: 0033:0x7f4abdef55d9
[ 66.880552] RSP: 002b:00007ffe0bcf00f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
[ 66.880552] RAX: ffffffffffffffda RBX: 00007f4abe1b8b20 RCX: 00007f4abdef55d9
[ 66.880552] RDX: 0000000000000000 RSI: 00005609ef97f26b RDI: 0000000000000003
[ 66.880552] RBP: 0000000000001021 R08: 0000000000000000 R09: 00007f4abe1baea0
[ 66.880552] R10: 0000000000000003 R11: 0000000000000202 R12: 000000000000270e
[ 66.880552] R13: 00007f4abe1b8b78 R14: 00007f4abe1b8b78 R15: 00007f4abe1b8b78
如下输出主要是来自于 oops_exit 中 print_oops_end_marker
[ 66.916000] ---[ end trace 0653eaf8276d2795 ]---
关于上面指示到代码的地方, 从 rip 中可以看出具体的出错的代码的位置
然后 从 Code 可以反汇编一下 出现问题的相关 汇编代码
然后 我们这里和 ko 文件中的 代码对比一下, 发现是可以完全对上的[从绿色部分开始]
[ 66.880552] Code: <48> 89 04 25 00 00 00 00 66 8b 05 17 c0 ff ff 66 89 04 25 08 00 00
[ 66.880552] RIP: print_pcb+0x17/0x1000 [Test08KernelNpe] RSP: ffffc90000663db8
完