Lists: | pgsql-hackers |
---|
From: | Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | gokul007(at)gmail(dot)com |
Subject: | Extension of Thick Indexes |
Date: | 2009-01-22 09:16:13 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
We are extending Gokul's Thick index functionality (
http://archives.postgresql.org/pgsql-patches/2008-01/msg00092.php) on 8.4
code base to
- fix existing defects
- Extend thick index usage for group by, count, nested queries, and
"delete/update with join" queries.
- Improve cost estimates for using thick indexes, and
- Support OR condition queries using thick indexes (which involves
computation of bitmap-or)
Details of the above features can be seen at
http://aurora.regenstrief.org/postgresql/report/6
Please let me know if you have any comments or suggestions.
Thanks,
Amit
Persistent Systems
From: | Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | gokul007(at)gmail(dot)com |
Subject: | Re: Extension of Thick Indexes |
Date: | 2009-02-06 10:17:28 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
We are plainning to use the attached result for estimating cost of
scanning thick indexes.
Regards,
Amit
Persistent Systems
www.persistentsys.com
On 1/22/09, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com> wrote:
> Hi,
>
> We are extending Gokul's Thick index functionality
> (http://archives.postgresql.org/pgsql-patches/2008-01/msg00092.php)
> on 8.4 code base to
> - fix existing defects
> - Extend thick index usage for group by, count, nested queries, and
> "delete/update with join" queries.
> - Improve cost estimates for using thick indexes, and
> - Support OR condition queries using thick indexes (which involves
> computation of bitmap-or)
>
> Details of the above features can be seen at
> http://aurora.regenstrief.org/postgresql/report/6
>
> Please let me know if you have any comments or suggestions.
>
> Thanks,
> Amit
> Persistent Systems
>
>
Attachment | Content-Type | Size |
---|---|---|
Index_Cost_Expr.pdf | application/pdf | 58.3 KB |
From: | "Shrish Purohit" <shrish_purohit(at)persistent(dot)co(dot)in> |
---|---|
To: | "'Amit Gupta'" <amit(dot)pc(dot)gupta(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Cc: | <gokul007(at)gmail(dot)com> |
Subject: | Re: Extension of Thick Indexes |
Date: | 2009-03-19 09:54:38 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi All,
We are extending Gokul's Thick index functionality.
I am attaching design doc and patch here with it.
Thick index improvements
1. Fix existing defects
2. Extend thick index usage for group by, count, nested queries, and
"delete/update with join" queries.
3. Improve cost estimates
Pending Features
1. Expression based thick index.
2. Support for Or conditions.
3. Optimization for update/delete queries.
Details can be found in design doc. We are using anoncvs checkout dated 1
dec 08. Please take a look at it.
Thanks,
Shrish Purohit | Software Engineer | Persistent Systems
shrish_purohit(at)persistent(dot)co(dot)in | Cell: +91 98509 59940 | Tel: +91 (20)
3023 4493
Innovation in software product design, development and delivery-
www.persistentsys.com
-----Original Message-----
From: Amit Gupta [mailto:amit(dot)pc(dot)gupta(at)gmail(dot)com]
Sent: Friday, February 06, 2009 3:47 PM
To: pgsql-hackers(at)postgresql(dot)org
Cc: gokul007(at)gmail(dot)com
Subject: Re: Extension of Thick Indexes
We are plainning to use the attached result for estimating cost of
scanning thick indexes.
Regards,
Amit
Persistent Systems
www.persistentsys.com
On 1/22/09, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com> wrote:
> Hi,
>
> We are extending Gokul's Thick index functionality
> (http://archives.postgresql.org/pgsql-patches/2008-01/msg00092.php)
> on 8.4 code base to
> - fix existing defects
> - Extend thick index usage for group by, count, nested queries, and
> "delete/update with join" queries.
> - Improve cost estimates for using thick indexes, and
> - Support OR condition queries using thick indexes (which involves
> computation of bitmap-or)
>
> Details of the above features can be seen at
> http://aurora.regenstrief.org/postgresql/report/6
>
> Please let me know if you have any comments or suggestions.
>
> Thanks,
> Amit
> Persistent Systems
>
>
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Attachment | Content-Type | Size |
---|---|---|
Thick_index.zip | application/octet-stream | 89.7 KB |
From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | Shrish Purohit <shrish_purohit(at)persistent(dot)co(dot)in> |
Cc: | Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, gokul007(at)gmail(dot)com |
Subject: | Re: Extension of Thick Indexes |
Date: | 2009-03-19 19:49:23 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Shrish,
It's too late to get this into 8.4 at this point.
Want to start a pgFoundry project around it, so people can play with it
for 8.5?
Also, for people not familiar with Gokul's work, can you give us an
explanation of the theory and implementation for this?
--Josh
From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | Shrish Purohit <shrish_purohit(at)persistent(dot)co(dot)in>, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, gokul007(at)gmail(dot)com |
Subject: | Re: Extension of Thick Indexes |
Date: | 2009-03-19 20:36:27 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Josh Berkus escribió:
> Also, for people not familiar with Gokul's work, can you give us an
> explanation of the theory and implementation for this?
It would be helpful to explain how this solves the lack of atomicity of
visibility data updates, which last time I checked was the killer
problem for this feature.
(It'd be good to do this in good ol' plain text email without resorting
to compressed files on random text formats)
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
From: | Gokulakannan Somasundaram <gokul007(at)gmail(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Cc: | Josh Berkus <josh(at)agliodbs(dot)com>, Shrish Purohit <shrish_purohit(at)persistent(dot)co(dot)in>, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Extension of Thick Indexes |
Date: | 2009-03-20 06:46:42 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
> It would be helpful to explain how this solves the lack of atomicity of
> visibility data updates, which last time I checked was the killer
> problem for this feature.
>
Hmmm... To put it more clearly, this problem occurs when there is a thick
index on a mutable function(marked as immutable). In order to avoid the
problem, i wrote the code that would not support functional indexes, it
would only support the normal ones. I think the main argument against Thick
Index was
- Visibility Map, which supports "Index only Scans" partially but by
occupying lesser space and doesn't have the functional index issue. Since
the main advantage of Thick Index was Index Only Scans, the community
preferred to wait for Visibility map
Heikki is working on the Visibility map and i think his observations might
help on Thick Index project.
Thanks,
Gokul.
From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Gokulakannan Somasundaram <gokul007(at)gmail(dot)com> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Shrish Purohit <shrish_purohit(at)persistent(dot)co(dot)in>, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Extension of Thick Indexes |
Date: | 2009-03-20 08:31:27 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Gokulakannan Somasundaram wrote:
>> It would be helpful to explain how this solves the lack of atomicity of
>> visibility data updates, which last time I checked was the killer
>> problem for this feature.
>
> Hmmm... To put it more clearly, this problem occurs when there is a thick
> index on a mutable function(marked as immutable). In order to avoid the
> problem, i wrote the code that would not support functional indexes, it
> would only support the normal ones. I think the main argument against Thick
> Index was
> - Visibility Map, which supports "Index only Scans" partially but by
> occupying lesser space and doesn't have the functional index issue. Since
> the main advantage of Thick Index was Index Only Scans, the community
> preferred to wait for Visibility map
>
> Heikki is working on the Visibility map and i think his observations might
> help on Thick Index project.
The common ground between thick indexes and visibility map based
index-only-scans is the method to pass Datums from the index to the
executor, and all the executor and planner changes to take advantage of
that. In fact, even without thick indexes or visibility map, that would
provide some gain: we could use the data from the index to filter rows
before going to the heap to check visibility. For example if you have a
wide table with a text column, and there's an index on the text column,
it would be faster to do a full scan on the index for a predicate like
"textcol LIKE '%foobar%'", than a sequential scan of the heap, assuming
there's only few matches.
The thick index approach has a lot of limitations compared to using
visibility map: How to deal with functional indexes? How to deal with
datatypes with broken comparison functions? And it needs support from
each indexam separately, and is outright impossible in something like an
bitmap index. These have all been discussed before, but I believe the
executor and indexam API changes required are similar to that with the
visibility map. That part of the patch I'm very interested in, though I
haven't looked at it at all yet.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From: | "Shrish Purohit" <shrish_purohit(at)persistent(dot)co(dot)in> |
---|---|
To: | "'Heikki Linnakangas'" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "'Gokulakannan Somasundaram'" <gokul007(at)gmail(dot)com> |
Cc: | "'Alvaro Herrera'" <alvherre(at)commandprompt(dot)com>, "'Josh Berkus'" <josh(at)agliodbs(dot)com>, "'Amit Gupta'" <amit(dot)pc(dot)gupta(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Extension of Thick Indexes |
Date: | 2009-03-20 09:37:55 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi All,
Some brief information about the thick index patch.
The patch adds snapshot (MVCC) information to the indexes to enable them
being used independently. With this information, the indexes need not refer
to the heap data to check an index key's visibility. Various functions such
as IndexTupleSatisfiesMVCC() are uesd to verify if the MVCC satisfies a
snapshot.
The following data structure is added to the thick index key data structure:
struct SnapshotFieldsData
{
TransactionId t_xmin;
TransactionId t_xmax;
TransactionId t_cid;
uint16 infomask;
};
Thus the pg_index catalog is added with flag to indicate thick index.
(FormData_pg_index ) is modified accordingly.
The IndexScanDescData structure, which stores information about how an index
can be scanned, was modified to include:
SnapshotIndexParams sindex_params;
// params of thick index that can be used for scanning
void *opaque;
The following data data-structure is used by the database execution engine
to scan thick index:
typedef struct SnapshotIndexParamsData
{
bool isIndexOnlyScan;
// true if index should be used without accessing the table.
bool is_minmax_scan;
// true if this the query contains min/max aggregate
functions.
bool is_count_only_scan;
// if the query contains only count(), then true.
// If true, then index rows are not cached.
bool cache_stack;
// If true, remembers path from root to leaf in the stack (a
member
// var defined below). Used For optimization of updates of a row, which
// requires delete and insert.
// While deleting a key from index, its stack is tracked, and the same stack
// is used for inserting the updated key.
CmdType operation;
// insert, update, delete, .
HeapTuple htup;
// reference to the heap tuple that's being operated
ScanKey insertion_skey; /* Insertion Scan Key */
void* stack;
// reference to the stack mentioned in comments of cache_stack.
TupleTableSlot* slot;
// stores heap data structures containing index keys
} SnapshotIndexParamsData;
IndexScan data structure, which stores information about how an index is
scanned, is updated to include following members:
Bool indexOnlyScan;
Bool is_minmax_scan;
Bool is_count_only_scan;
The Postgres optimizer figures out various ways to access each relation used
by the input SQL statement in set_plain_rel_pathlist() function, which
indirectly invokes create_index_path(). Create_index_path() does the
following:
Identifies index path to access a table, figures out if index covers all the
columns used by the SQL statement to initialize index_covers_all_clauses
sets the cost of accessing table using index (or just using index only scan)
by invoking cost_index() function.
Functions index_covers_rel_clauses, index_covers_having_quals,
index_covers_target_list are used to set flag index_covers_all_clauses.
index_covers_all_clauses flag indicates that index covers all clauses and
can be used In IndexOnlyScan mode.
--How updates and deletes are performed?
find_old_slot(estate) identifies the old_slot information from estates'
ExprContexts. ExecUpdate uses oldslot, newslot and function
ExecUpdateIndexTuples is used to update the index. ExecUpdateIndexTuples
prepares iscan IndexScanDescriptor which is used to scan index.
Index is scanded by index_getnext(iscan,ForwardScanDirection).for b-tree
index it internall calls _bt_readpage. for update and delete operations
_bt_readpage calls SnapshotIndexDeleteTuple which updates mvcc information
of the index tuple based on MVCC information includeed in the slot.
ExecUpdateIndexTuples then inserts the new index tuple.
FormIndexDatum stores the MVCC information along with values [].
index_form_tuple is also modified accordingly to store MVCC information.
Functions StoreMinimalIndexTuple and GetNextMinimalIndexTuple are used to
store index key column values into mintup and to get values from mintup
respectively.
For countOnlyScans we count same static slot, while for minmax_scan first
non-null tuple as the tuple with minium/maximum value.
Thanks,
Shrish Purohit | Software Engineer | Persistent Systems
shrish_purohit(at)persistent(dot)co(dot)in | Cell: +91 98509 59940 | Tel: +91 (20)
3023 4493
Innovation in software product design, development and delivery-
www.persistentsys.com
-----Original Message-----
From: Heikki Linnakangas [mailto:heikki(dot)linnakangas(at)enterprisedb(dot)com]
Sent: Friday, March 20, 2009 2:01 PM
To: Gokulakannan Somasundaram
Cc: Alvaro Herrera; Josh Berkus; Shrish Purohit; Amit Gupta;
pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Extension of Thick Indexes
Gokulakannan Somasundaram wrote:
>> It would be helpful to explain how this solves the lack of atomicity of
>> visibility data updates, which last time I checked was the killer
>> problem for this feature.
>
> Hmmm... To put it more clearly, this problem occurs when there is a thick
> index on a mutable function(marked as immutable). In order to avoid the
> problem, i wrote the code that would not support functional indexes, it
> would only support the normal ones. I think the main argument against
Thick
> Index was
> - Visibility Map, which supports "Index only Scans" partially but by
> occupying lesser space and doesn't have the functional index issue. Since
> the main advantage of Thick Index was Index Only Scans, the community
> preferred to wait for Visibility map
>
> Heikki is working on the Visibility map and i think his observations might
> help on Thick Index project.
The common ground between thick indexes and visibility map based
index-only-scans is the method to pass Datums from the index to the
executor, and all the executor and planner changes to take advantage of
that. In fact, even without thick indexes or visibility map, that would
provide some gain: we could use the data from the index to filter rows
before going to the heap to check visibility. For example if you have a
wide table with a text column, and there's an index on the text column,
it would be faster to do a full scan on the index for a predicate like
"textcol LIKE '%foobar%'", than a sequential scan of the heap, assuming
there's only few matches.
The thick index approach has a lot of limitations compared to using
visibility map: How to deal with functional indexes? How to deal with
datatypes with broken comparison functions? And it needs support from
each indexam separately, and is outright impossible in something like an
bitmap index. These have all been discussed before, but I believe the
executor and indexam API changes required are similar to that with the
visibility map. That part of the patch I'm very interested in, though I
haven't looked at it at all yet.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.