Re: [COMMITTERS] pgsql: In COPY, insert tuples to the heap in batches.

Lists: pgsql-committerspgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: In COPY, insert tuples to the heap in batches.
Date: 2011-11-09 09:06:59
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

In COPY, insert tuples to the heap in batches.

This greatly reduces the WAL volume, especially when the table is narrow.
The overhead of locking the heap page is also reduced. Reduced WAL traffic
also makes it scale a lot better, if you run multiple COPY processes at
the same time.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/d326d9e8ea1d690cf6d968000efaa5121206d231

Modified Files
--------------
src/backend/access/heap/heapam.c | 484 ++++++++++++++++++++++++++++++++++----
src/backend/commands/copy.c | 166 ++++++++++++-
src/backend/postmaster/pgstat.c | 6 +-
src/include/access/heapam.h | 2 +
src/include/access/htup.h | 31 +++
src/include/pgstat.h | 2 +-
6 files changed, 629 insertions(+), 62 deletions(-)


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: In COPY, insert tuples to the heap in batches.
Date: 2011-11-09 13:25:46
Message-ID: CA+U5nMLeWrdDJK32AhCzdpshrHhDNdew1ppBiu+AOpwnCnNBPw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Nov 9, 2011 at 9:06 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)iki(dot)fi> wrote:
> In COPY, insert tuples to the heap in batches.
>
> This greatly reduces the WAL volume, especially when the table is narrow.
> The overhead of locking the heap page is also reduced. Reduced WAL traffic
> also makes it scale a lot better, if you run multiple COPY processes at
> the same time.

Sounds good.

I can't see where this applies backup blocks. If it does, can you
document why/where/how it differs from other WAL records?

There's no need for conflict processing on replay with this new WAL
record type. But you should document that and alter the comments that
say it is necessary. Search "conflict".

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: In COPY, insert tuples to the heap in batches.
Date: 2011-11-09 18:42:53
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 09.11.2011 15:25, Simon Riggs wrote:
> On Wed, Nov 9, 2011 at 9:06 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)iki(dot)fi> wrote:
>> In COPY, insert tuples to the heap in batches.
>>
>> This greatly reduces the WAL volume, especially when the table is narrow.
>> The overhead of locking the heap page is also reduced. Reduced WAL traffic
>> also makes it scale a lot better, if you run multiple COPY processes at
>> the same time.
>
> Sounds good.
>
> I can't see where this applies backup blocks. If it does, can you
> document why/where/how it differs from other WAL records?

Good catch, I missed that. I copied the redo function from normal
insertion, but missed that heap_redo() takes care of backup blocks for
you, while heap2_redo() does not.

I'll go fix that..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com