数据库最终都是持久化存储的(除了内存数据库等),持久化就要将数据从内存Buffer落盘到外存。这里分析一下PostgreSQL中外存管理部分的内容。源码在src/backend/storage/smgr
这一部分。
README
建议首先阅读一下src/backend/storage/smgr/README
里的内容。中文翻译可参考文章postgres外存管理之smgr
src/backend/storage/smgr/README
Storage Managers
================
In the original Berkeley Postgres system, there were several storage managers,
of which only the "magnetic disk" manager remains.
The "magnetic disk" manager is itselfseriously misnamed, because actually
it supports any kind of device for which the operating system provides standard filesystem operations; which
these days is pretty much everything of interest. However, we retain the
notion of a storage manager switch in case anyone ever wants to reintroduce
other kinds of storage managers. Removing the switch layer would save
nothing noticeable anyway, since storage-access operations are surely far
more expensive than one extra layer of C function calls.
In Berkeley Postgres each relation was tagged with the ID of the storage
manager to use for it. This is gone. It would be probably more reasonable
to associate storage managers with tablespaces, should we ever re-introduce
multiple storage managers into the system catalogs.
The files in this directory, and their contents, are
smgr.c The storage manager switch dispatch code. The routines in
this file call the appropriate storage manager to do storage
accesses requested by higher-level code. smgr.c also manages
the file handle cache (SMgrRelation table).
md.c The "magnetic disk" storage manager, which is really just
an interface to the kernel's filesystem operations.
Note that md.c in turn relies on src/backend/storage/file/fd.c.
Relation Forks
==============
Since 8.4, a single smgr relation can be comprised of multiple physical
files, called relation forks. This allows storing additional metadata like
Free Space information in additional forks, which can be grown and truncated
independently of the main data file, while still treating it all as a single
physical relation in system catalogs.
It is assumed that the main fork, fork number 0 or MAIN_FORKNUM, always
exists. Fork numbers are assigned in src/include/common/relpath.h.
Functions in smgr.c and md.c take an extra fork number argument, in addition
to relfilenode and block number, to identify which relation fork you want to
access. Since most code wants to access the main fork, a shortcut version of
ReadBuffer that accesses MAIN_FORKNUM is provided in the buffer manager for
convenience.
截取README中比较重要的两句:
- 磁盘管理器不仅限于管理磁盘,实际上它支持任何社设备,只要操作系统为该设备实现了标准文件系统操作接口。
- 虽然PG存储管理器目前仅有磁盘管理器,但依然保留了存储管理器(smgr)这个中间层,以便引入其他类型的存储管理器。
存储管理器
实现了存储管理器分发调度接口,相当于是存储管理的一层抽象。所有对文件系统的操作都是由这里进行分发。我们看一下smgr.h中的函数声明:
extern void smgrinit(void);
extern SMgrRelation smgropen(RelFileNode rnode, BackendId backend);
extern bool smgrexists(SMgrRelation reln, ForkNumber forknum);
extern void smgrsetowner(SMgrRelation *owner, SMgrRelation reln);
extern void smgrclearowner(SMgrRelation *owner, SMgrRelation reln);
extern void smgrclose(SMgrRelation reln);
extern void smgrcloseall(void);
extern void smgrclosenode(RelFileNodeBackend rnode);
extern void smgrrelease(SMgrRelation reln);
extern void smgrreleaseall(void);
extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
extern void smgrdosyncall(SMgrRelation *rels, int nrels);
extern void smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo);
extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum, char *buffer, bool skipFsync);
extern bool smgrprefetch(SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum);
extern void smgrread(SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum, char *buffer);
extern void smgrwrite(SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum, char *buffer, bool skipFsync);
extern void smgrwriteback(SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum, BlockNumber nblocks);
extern BlockNumber smgrnblocks(SMgrRelation reln, ForkNumber forknum);
extern BlockNumber smgrnblocks_cached(SMgrRelation reln, ForkNumber forknum);
extern void smgrtruncate(SMgrRelation reln, ForkNumber *forknum,
int nforks, BlockNumber *nblocks);
extern void smgrimmedsync(SMgrRelation reln, ForkNumber forknum);
extern void AtEOXact_SMgr(void);
extern bool ProcessBarrierSmgrRelease(void);
也就是说数据库与外存进行交互,都是通过这些接口实