This document explains the concept of "filtering" in UPX. Basically
filtering is a data preprocessing method which could improve the
compression ratio of the files UPX processes.
Currently the filters UPX uses are all based on one very special
algorithm which is working well on ix86 executable files.
This is what upx calls the "naive" implementation. There is also a
"clever" method which works only with 32-bit executable file formats
and was first implemented in UPX.
Let's start with an example (from this point I assume a 32-bit file
format). Consider this code fragment:
00025970: E877410600 calln FatalError
00025975: 8B414C mov eax,[ecx+4C]
00025978: 85C0 test eax,eax
0002597A: 7419 je file:00025995
0002597C: 85F6 test esi,esi
0002597E: 7504 jne file:00025984
00025980: 89C6 mov esi,eax
00025982: EB11 jmps file:00025995
00025984: 39C6 cmp esi,eax
00025986: 740D je file:00025995
00025988: 83C4F4 add (d) esp,F4
0002598B: 68A0A91608 push 0816A9A0
00025990: E857410600 calln FatalError
00025995: FF45F4 inc [ebp-0C]
Here you can find two calls to a function called "FatalError". As you
probably know the compression ratio is better if the compressor engine
finds longer sequences of repeated strings. In this case the engine
sees the following two byte sequences:
E877 410600 8B and
E857 410600 FF.
So it can find a 3-byte-long match.
Now comes the trick. On ix86 near calls are encoded as 0xE8 then a 32
bit relative offset to the destination address. Let's see what
happens if the position of the call is added to that offset:
0x64177 + 0x25970 = 0x89AE7
0x64157 + 0x25990 = 0x89AE7
E8 E79A0800 8B
E8 E79A0800 FF
As you can see now the compressor engine finds a 5-byte-long match.
Which means, that we've just saved 2 bytes of compressed data. Not bad.
So this is the basic idea (the "naive" implementation). All we have to
do is to "filter" the uncompressed data using this method before
compression, and "unfilter" it after decompression. Simply go over the
memory, find 0xE8 bytes and process the next 4 bytes as specified
above.
Of course there are several possibilities where this scheme could be
improved. First, not only calls could be handled this way - near jumps
(0xE9 + 32-bit offset) could work similarly.
A second improvement could be if we limit this filtering only for the
area occupied by real code - there is no point in messing with general
data.
Another improvement comes if the byte order of the 32-bit offset is
reversed. Why? Here is another call which follows the above fragment:
000261FA: E8C9390600 calln ErrorF
0x639C9 + 0x261FA = 0x89BC3
E8 C39B 0800 compare this with
E8 E79A 0800
As you can see these two functions are quite close together, but the
compressor is not able to utilize this information (2-byte-long matches
are usually not useful) unless the byte order of the offsets are
reversed. In this case:
E8 0008 9AE7
E8 0008 9BC3
So, the compressor engine finds a 3-byte-long match here. This is a
nice improvement - now the engine utilizes the similarity of nearby
destinations too.
This is nice, but what happens when we find a "fake" call - ie. an 0xE8
which is part of another instruction? Like this:
0002A3B1: C745 E8 00000000 mov [ebp-18],00000000
In this case those nice 0x00 bytes are overwritten with some less
compressible data. This is the disadvantage of the "naive"
implementation.
So let's be clever and try to detect and process only "real" calls. In
UPX a simple method is used to find these calls. We simply check that
the destinations of these calls are inside the same area as the calls
themselves (so the above code is still a false positive, but it helps
generally). A better method would be to actually disassemble the code -
contributions are welcome :-)
But this is only half of the job. We can not simply process one call
then skip another one - the unfiltering process needs some information
to be able to reverse the filtering.
UPX uses the following idea, which works nicely. First we assume that
the size of the area that should be filtered is less than 16 MiB. Then
UPX scans over this area and keeps a record of the bytes that are
following the 0xE8 bytes. If we are lucky, there will be bytes that
were not found following 0xE8. These bytes are our candidates to be
used as markers.
Do you still remember that we assumed that the size of scanned area is
less than 16 MiB? Well, this means that when we process a real call, the
resulting offset will be less than 0x00FFFFFF too. So the MSB is always
0x00. Which is a nice place to store our marker. Of course we should
reverse the byte order in the resulting offset - so this marker will
appear just after the 0xE8 byte and not 4 bytes after it.
That's all. Just go over the memory area, identify the "real" calls,
and use this method to mark them. Then the job of the unfilter is very
easy - it just searches for a 0xE8 + marker sequence and does the
unfiltering if it finds one. It's clever, isn't it? :)
To tell you the truth it's not this simple in UPX. It can use an
additional parameter ("add_value") which makes things a little bit more
complicated (for example it can happen that a found marker is proven to
be unusable because of some overflow during an addition).
And the whole algorithm is optimized for simplicity on the unfiltering
side (as short and as fast assembly as possible - see stub/macros.ash),
which makes the filtering process a little more difficult (fcto_ml.ch,
fcto_ml2.ch, filteri.cpp).
As it can be seen in filteri.cpp, there are lots of variants of this
filtering implemented - native/clever, calls/jumps/calls&jumps,
reversed/unreversed offsets - a sum of 18 slightly different filters
(and another 9 variants for 16-bit programs).
You can select one of them using the command line parameter "--filter="
or try most of them with "--all-filters". Or just let upx use the one
we defined as the default for that executable format.
EOF
没有合适的资源?快使用搜索试试~ 我知道了~
upx-3.91-src.tar.gz_UPX_compress_upx 3_upx src
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 74 浏览量
2022-09-24
09:14:05
上传
评论
收藏 1.17MB GZ 举报
温馨提示
一款先进的可执行程序文件压缩器,压缩过的可执行文件体积缩小50%-70% ,这样减少了磁盘占用空间、网络上传下载的时间和其它分布以及存储费用。 通过 UPX 压缩过的程序和程序库完全没有功能损失和压缩之前一样可正常地运行,对于支持的大多数格式没有运行时间或内存的不利后果。 UPX 支持许多不同的可执行文件格式 包含 Windows 95/98/ME/NT/2000/XP/CE 程序和动态链接库、DOS 程序、 Linux 可执行文件和核心。
资源推荐
资源详情
资源评论












收起资源包目录





































































































共 426 条
- 1
- 2
- 3
- 4
- 5
资源评论


我虽横行却不霸道
- 粉丝: 114
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 软件工程期末考试总复习题及答案.doc
- 有钱买不到刹车片.doc
- 《MATLAB程序设计教程(第二版)》第10章--MATLAB图形用户界面设计.ppt
- 国家技术创新计划项目管理实施情况汇总表(表格模板、DOC格式).doc
- SATWE-TAT-PMSAP程序中的内力调整.ppt
- 室内给水排水系统.doc
- 5t筑炉施工方案.doc
- 房地产前期策划.ppt
- 管道防腐及保温分项工程质量验收报告.doc
- 真空预压发处理软土地基施工方案(英文).doc
- 中小型机械操作工施工安全技术交底.doc
- 软件配置管理控制程序.doc
- 成套配电柜(Ⅱ)低压成套柜(屏、台).doc
- 如何快速提升组织智商(0001).doc
- 基于单片机的数字万用表方案设计书38235.doc
- 全现浇大模板多层住宅搂工程施工组织设计方案范本.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
