Linux驱动模块.ko内存精简优化过程

linux 驱动模块可以独立的编译成 .ko 文件，虽然大小一般只有几 mb，但对总内存只有几十 mb 的小型 linux 系统来说，常常也是一个非常值得优化的点。本文以一个实际例子，详细描述 .ko 内存精简优化的具体过程。
1. strip 文件
因为 .ko 文件是一个标准的 elf 文件，通常我们首先会想到使用 strip 命令来精简文件大小。strip .ko 有以下几种选项：
strip --strip-all test.ko // strip 掉所有的调试段，ko 文件体积减少很多，ko 不能正常 insmod
strip --strip-debug test.ko // strip 掉 debug 段，ko 文件体积减少不多，ko 可以正常 insmod
strip --strip-unneeded test.ko // strip 掉和动态重定位无关的段，ko 文件体积减少不多，ko 可以正常 insmod
.ko 文件具体的体积变化：
6978208 origin-test.ko* // no strip
1984856 strip-all-test.ko* // strip --strip-all
6884544 strip-debug-test.ko* // strip --strip-debug
6830704 strip-unneeded-test.ko* // strip --strip-unneeded
可以看到在保存 .ko 能正常使用的前提下， strip 命令对 .ko 文件并不能减少多大的体积。而且一通操作下来， .ko 文件中的关键数据 text/data/bss 段的体积没有任何变化：
$ size *.ko
text data bss dec hex filename
1697671 275791 28367 2001829 1e8ba5 origin-test.ko
1697671 275791 28367 2001829 1e8ba5 strip-all-test.ko
1697671 275791 28367 2001829 1e8ba5 strip-debug-test.ko
1697671 275791 28367 2001829 1e8ba5 strip-unneeded-test.ko
question 1： strip 命令是否还有命令能实现更多的精简？ strip 的本质是什么，具体 strip 掉了哪些东西？
我们通过读取 elf 文件的 section 信息来比较 strip 前后的差异：
$ readelf -s origin-test.ko
there are 48 section headers, starting at offset 0x6a6ea0:
section headers:
[nr] name type address offset
size entsize flags link info align
[ 0] null 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .note.gnu.build-i note 0000000000000000 00000040
0000000000000024 0000000000000000 a 0 0 4
[ 2] .note.linux note 0000000000000000 00000064
0000000000000018 0000000000000000 a 0 0 4
[ 3] .text progbits 0000000000000000 0000007c
00000000001393d6 0000000000000000 ax 0 0 2
[ 4] .rela.text rela 0000000000000000 003b9b90
00000000002b7550 0000000000000018 i 45 3 8
[ 5] .text.unlikely progbits 0000000000000000 00139452
0000000000000d74 0000000000000000 ax 0 0 2
[ 6] .rela.text.unlike rela 0000000000000000 006710e0
0000000000001950 0000000000000018 i 45 5 8
[ 7] .init.text progbits 0000000000000000 0013a1c6
000000000000016e 0000000000000000 ax 0 0 2
[ 8] .rela.init.text rela 0000000000000000 00672a30
...
$ readelf -s strip-all-test.ko
there are 27 section headers, starting at offset 0x1e4298:
section headers:
[nr] name type address offset
size entsize flags link info align
[ 0] null 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .note.gnu.build-i note 0000000000000000 00000040
0000000000000024 0000000000000000 a 0 0 4
[ 2] .note.linux note 0000000000000000 00000064
0000000000000018 0000000000000000 a 0 0 4
[ 3] .text progbits 0000000000000000 0000007c
00000000001393d6 0000000000000000 ax 0 0 2
[ 4] .text.unlikely progbits 0000000000000000 00139452
0000000000000d74 0000000000000000 ax 0 0 2
[ 5] .init.text progbits 0000000000000000 0013a1c6
000000000000016e 0000000000000000 ax 0 0 2
...
从信息上看 strip 主要删除了 flags 为 i 的 sections，而 flags 带 a 的 sections 是不能被删除的。关于 sectionsflags 的定义在 readelf 命令的最后面有详细描述：
key to flags:
w (write), a (alloc), x (execute), m (merge), s (strings), i (info),
l (link order), o (extra os processing required), g (group), t (tls),
c (compressed), x (unknown), o (os specific), e (exclude),
p (processor specific)
另外还发现，对 .ko 文件来说 .rela. 开头的 sections 是不能被删除的， insmod 时需要这些信息。例如 .rela.text 占用了很大的体积，但是不能直接粗暴的直接 strip 掉。
question 2：对于 .ko 文件中 flags 为 i 的 sections 在模块 insmod 以后是否需要占据内存？
内核代码中对 .ko 文件 insmod 动态加载时的主流程：
syscall_define3(finit_module) / syscall_define3(init_module)
|→ load_module()
|→ layout_and_allocate()
| |→ setup_load_info() // info->index.mod = section .gnu.linkonce.this_module
| |
| |→ layout_sections() // 解析 ko elf 文件，统计需要加载到内存中的 section
| | // 累计长度到 mod->core_layout.size 和 mod->init_layout.size
| |
| |→ layout_symtab() // 解析 ko elf 文件，统计需要加载到内存中的符号表
| | // 累计长度到 mod->core_layout.size
| |
| |→ move_module() // 根据 mod->core_layout.size 和 mod->init_layout.size 的长度
| // 使用 vmalloc 分配空间，并且拷贝对应的 section 到内存
|
|→ apply_relocations() // 对加载到内存的 section 做重定位处理
|
|→ do_init_module() // 执行驱动模块的 module_init() 函数，完成后释放 mod->init_layout.size 内存
分析具体的代码细节，发现只有带 alloc 属性（即 flags 带 a）的 section 才会在模块加载时统计并拷贝进内存：
static void layout_sections(struct module *mod, struct load_info *info)
{
/* (1) 只识别带 shf_alloc 的 section */
static unsigned long const masks[][2] = {
/* note: all executable code must be the first section
* in this array; otherwise modify the text_size
* finder in the two loops below */
{ shf_execinstr | shf_alloc, arch_shf_small },
{ shf_alloc, shf_write | arch_shf_small },
{ shf_ro_after_init | shf_alloc, arch_shf_small },
{ shf_write | shf_alloc, arch_shf_small },
{ arch_shf_small | shf_alloc, 0 }
};
unsigned int m, i;
for (i = 0; i hdr->e_shnum; i++)
info->sechdrs[i].sh_entsize = ~0ul;
/* (2) 遍历 ko 文件的 section，根据上述标志来统计
把 alloc 类型的 section 统计进 mod->core_layout.size
*/
pr_debug(core section allocation order: );
for (m = 0; m e_shnum; ++i) {
elf_shdr *s = &info->sechdrs[i];
const char *sname = info->secstrings + s->sh_name;
if ((s->sh_flags & masks[m][0]) != masks[m][0]
|| (s->sh_flags & masks[m][1])
|| s->sh_entsize != ~0ul
|| module_init_section(sname))
continue;
s->sh_entsize = get_offset(mod, &mod->core_layout.size, s, i);
pr_debug( %s , sname);
}
}
/* (3) 遍历 ko 文件的 section，根据上述标志来统计
把 alloc 类型的并且名字以 '.init' 开头的 section 统计进 mod->init_layout.size
*/
pr_debug(init section allocation order: );
for (m = 0; m e_shnum; ++i) {
elf_shdr *s = &info->sechdrs[i];
const char *sname = info->secstrings + s->sh_name;
if ((s->sh_flags & masks[m][0]) != masks[m][0]
|| (s->sh_flags & masks[m][1])
|| s->sh_entsize != ~0ul
|| !module_init_section(sname))
continue;
s->sh_entsize = (get_offset(mod, &mod->init_layout.size, s, i)
| init_offset_mask);
pr_debug( %s , sname);
}
}
}
flags 带 i 的 section 只会在 apply_relocations() 重定位时提供信息，这部分 section 不会在内存中常驻。
结论：strip 操作 .ko 文件只会精简掉少量 i 的 section， .ko 文件少量减小，但是对动态加载后的内存占用毫无影响。
2. 运行时内存占用
但是生活还得继续，优化还得想办法。我们仔细分析关键数据 text/data/bss 段在模块加载过程中的内存占用。
加载前：
$ size test.ko
text data bss dec hex filename
1697671 275791 28367 2001829 1e8ba5 test.ko
模块 insmod 后的内存占用，因为是通过 vmalloc() 分配的，我们可以通过 vmallocinfo 查看内存占用情况：
# cat /sys/module/test/coresize
4203425
# cat /sys/module/test/initsize
0
# cat /proc/vmallocinfo
// core_layout.size 占用 4.2 m 内存
0x00000000fd4ec521-0x000000007ff17966 4210688 load_module+0x1b86/0x1c8e pages=1027 vmalloc vpages
0x000000007ff17966-0x000000004e29ad2e 16384 load_module+0x1b86/0x1c8e pages=3 vmalloc
可以看到，加载前 test.ko 的 text/data/bss 段的总长为 2 m 左右，但是模块加载后总共占用了 4.2 m 内存。
question 3：为什么模块加载后会有多出的内存占用？
我们在内核代码中加上调试信息，跟踪 mod->core_layout.size 的变化情况，终于找到了关键所在：
syscall_define3(finit_module) / syscall_define3(init_module)
|→ load_module()
|→ layout_and_allocate()
| |→ setup_load_info() // mod->core_layout.size = 0x0.
| |
| |→ layout_sections() // mod->core_layout.size = 0x1f8390
| |
| |→ layout_symtab() // mod->core_layout.size = 0x4023a1.
| |
| |→ move_module() // 根据 mod->core_layout.size 和 mod->init_layout.size 的长度
可以看到是在 layout_symtab() 函数中增大了多余的长度， layout_symtab() 函数在 config_kallsyms 使能的情况下才有效，存储的驱动模块的符号表。
一般情况下我们并不需要模块符号表，可以关闭内核的 config_kallsyms 选项来查看内存的占用情况：
# cat /sys/module/test/coresize
2092876
# cat /sys/module/test/initsize
0
# cat /proc/vmallocinfo
// core_layout.size 占用 2.0 m 内存
0x000000009e1c62e8-0x000000001024ef17 2097152 0xffffffff8006f3de pages=511 vmalloc
0x000000004070c817-0x00000000cc1b6736 28672 0xffffffff41534922 pages=6 vmalloc
多余的 2.2 m 内存被完美的精简下来。
但是这种方法也只能减少 .ko 的静态内存占用，驱动动态分配的内存只能分析代码逻辑去优化。
结论：关闭 config_kallsyms 选项可以精简 .ko 模块符号表的内存占用，精简收益还是不错的。

RS-232 接口,RS-232 接口是什么意思
联通数科“银川工业大脑”案例，入选人民出版社《数字政府建设》一书
MAXQ7665扇区可擦除程序和数据闪存的应用内编程（IAP）
5G技术将给智慧城市特殊连接带来怎样的创新变化
腾讯云:组织架构调整告一段落,接下来怎么打仗?
Linux驱动模块.ko内存精简优化过程
智能照明——让城市变得更加智能
基于5G工业路由器IR305的电子警察联网解决方案
关于PON网络的技术知识
天津杰泰高科：霍尔传感器HG08 系列与HG12 系列详情
cbb电容容量变小的原因
采用ZigBee和GPRS无线通信方式相结合的温度控制系统设计
MOS管栅源下拉电阻的作用
如何预防电线电缆因导线过载而引起的火灾问题
PCB板批锋的问题如何解决
利用FPGA器件FLEX EP10K50芯片实现DDS电路的设计
天津大学微波太赫兹波微电子系统实验室启用仪式暨太赫兹测量与应用论坛成功举行
怎样利用Python去快速创建矩阵？
Flask两种配置路由的方式说明
浅谈汽车连接器的分类和选择方法