Lists: | pgsql-hackers |
---|
From: | Peter Griggs <petergriggs33(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | [QUESTION/PROPOSAL] loose quadtree in spgist |
Date: | 2020-01-07 16:33:31 |
Message-ID: | CACEwj4p2C9Nhi=r10O98ms6st8kNewt+Nbx5rn-Z_E8A-gMguQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hello, I wanted some guidance/suggestions about creating an spgist
extension. For context, i am a grad student doing research that involves
comparing the performance of different indexes for spatial data. We've
built a system that uses Postgres and one of the data structures we want to
use is a loose quadtree, but there is no implementation of this data
structure in spgist. The reason why I think this is pretty do-able is that
it is quite similar to a quadtree on boxes, which is implemented in
src/backend/utils/adt/geo_spgist.c.
Additionally, I found by grepping through the repo for the existing
functions in spgist/box_ops operator class that several catalog files need
to be updated to reflect a new operator class in spgist. The files that I
believe need to be changed to create a new
spgist_loose_box_ops operator class are:
src/include/catalog/pg_amop.dat
src/include/catalog/pg_amproc.dat
src/include/catalog/pg_opclass.dat
src/include/catalog/pg_opfamily.dat
I've poked around quite a bit in the spgist code and have tried making
minimal changes to geo_spgist.c, but I haven't done any development on
postgres before, so i'm running into some issues that I couldn't find help
with on the postgres slack, by searching the mailing list, or by scouring
the development wikis. For example, I wanted to just print out some data to
see what quadrant a box is being placed into in the geo_spgist.c code. I
understand that printing to stdout won't work in postgres, but I thought
that I could possibly write some data to the logfile. I tried updating a
function to use both elog and ereport and re-built the code. However, I
can't get anything to print out to the logfile no matter what I try. Does
anyone have tips for printing out and debugging in general for postgres
development?
Any tips or guidance would be much appreciated. Also, if there's a
different route I should go to turn this into a proposal for a patch please
let me know. I'm new to postgres dev.
Best,
Peter
From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Peter Griggs <petergriggs33(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [QUESTION/PROPOSAL] loose quadtree in spgist |
Date: | 2020-01-07 22:56:34 |
Message-ID: | 20200107225634.ffeerzx2gme7x5im@development |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, Jan 07, 2020 at 11:33:31AM -0500, Peter Griggs wrote:
>Hello, I wanted some guidance/suggestions about creating an spgist
>extension. For context, i am a grad student doing research that involves
>comparing the performance of different indexes for spatial data. We've
>built a system that uses Postgres and one of the data structures we want to
>use is a loose quadtree, but there is no implementation of this data
>structure in spgist. The reason why I think this is pretty do-able is that
>it is quite similar to a quadtree on boxes, which is implemented in
>src/backend/utils/adt/geo_spgist.c.
>
>Additionally, I found by grepping through the repo for the existing
>functions in spgist/box_ops operator class that several catalog files need
>to be updated to reflect a new operator class in spgist. The files that I
>believe need to be changed to create a new
>spgist_loose_box_ops operator class are:
>
>src/include/catalog/pg_amop.dat
>src/include/catalog/pg_amproc.dat
>src/include/catalog/pg_opclass.dat
>src/include/catalog/pg_opfamily.dat
>
You should probably try using CREATE OPERATOR CLASS command [1], not
modify the catalogs directly. That's only necessary for built-in index
types (i.e. available right after initdb). But you mentioned you're
working on an extension, so the command is the right thing to do (after
all, you don't know OIDs of objects from the extension).
>
>I've poked around quite a bit in the spgist code and have tried making
>minimal changes to geo_spgist.c, but I haven't done any development on
>postgres before, so i'm running into some issues that I couldn't find help
>with on the postgres slack, by searching the mailing list, or by scouring
>the development wikis.
Well, learning the ropes may take a bit of time, and pgsql-hackers is
probably the right place to ask ...
>For example, I wanted to just print out some data to
>see what quadrant a box is being placed into in the geo_spgist.c code. I
>understand that printing to stdout won't work in postgres, but I thought
>that I could possibly write some data to the logfile. I tried updating a
>function to use both elog and ereport and re-built the code. However, I
>can't get anything to print out to the logfile no matter what I try. Does
>anyone have tips for printing out and debugging in general for postgres
>development?
>
Well, elog/ereport are the easiest approach (it's what I'd do), and they
do about the same thing. The main difference is that ereport allows
translations of messages to other languages, while elog is for internal
things that should not happen (unexpected errors, ...). For debugging
just use elog(), I guess.
It's hard to say why you're not getting anything logged, because you
haven't shown us any code. My guess is that you're uring log level that
is not high enough to make it into the log file.
The default config in postgresql.conf says
log_min_messages = warning
which means the level has to be at least WARNING to make it into the
file. So either WARNING, ERROR, LOG, FATAL, PANIC. So for example
elog(INFO, "test message");
won't do anything, but
elog(LOG, "test message");
will write stuff to the log file. If you use WARNING, you'll actually
get the message on the client console (well, there's client_min_messages
but you get the idea).
>
>Any tips or guidance would be much appreciated. Also, if there's a
>different route I should go to turn this into a proposal for a patch
>please let me know. I'm new to postgres dev.
>
A general recommendation is to show snippets of code, so that people on
this list actually can help without too much guessing what you're doing.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: | Peter Griggs <petergriggs33(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [QUESTION/PROPOSAL] loose quadtree in spgist |
Date: | 2020-01-08 19:36:14 |
Message-ID: | CACEwj4oA3VsJjzKrVqwzNyAd7Fad5p97uUX6qe8oTvn8g=1nJA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Thank you for the tips Tomas, I really appreciate it. You're definitely
right that I should include code snippets, so here's the code i'm trying to
change.
In the getQuadrant function in the file src/backend/utils/adt/geo_spgist.c,
I only added some elog statements to see the quadrant that a box is placed
into using the current code. getQuadrant is called several times by the
spg_box_quad_picksplit function, which is used when inserting into the
quadtree. With this change, I can still build postgres but when I try to
trigger the code, nothing gets printed to my logfile. Here's my process for
trying to trigger this code:
1. delete the current postgres installation by removing /usr/local/pgsql
2. re-build from source by following documentation
3. create a database with a table that has two columns: (id int, b box)
4. insert some boxes into the table and build an index on it using "CREATE
INDEX box_quad_idx ON quad USING spgist(b);"
And here's the function I modified:
/* * Calculate the quadrant * * The quadrant is 8 bit unsigned
integer with 4 least bits in use. * This function accepts BOXes as
input. They are not casted to * RangeBoxes, yet. All 4 bits are set
by comparing a corner of the box. * This makes 16 quadrants in total.
*/static uint8
getQuadrant(BOX *centroid, BOX *inBox){
uint8 quadrant = 0;
elog(LOG, "BOX (minx, miny) = (%d, %d)\n", centroid->low.x, centroid->low.y);
elog(LOG, "BOX (maxx, maxy) = (%d, %d)\n", centroid->high.x, centroid->high.y);
if (inBox->low.x > centroid->low.x)
quadrant |= 0x8;
if (inBox->high.x > centroid->high.x)
quadrant |= 0x4;
if (inBox->low.y > centroid->low.y)
quadrant |= 0x2;
if (inBox->high.y > centroid->high.y)
quadrant |= 0x1;
elog(LOG, "Quadrant bitvector value is: %d\n", quadrant);
return quadrant;}
On Tue, Jan 7, 2020 at 5:56 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:
> On Tue, Jan 07, 2020 at 11:33:31AM -0500, Peter Griggs wrote:
> >Hello, I wanted some guidance/suggestions about creating an spgist
> >extension. For context, i am a grad student doing research that involves
> >comparing the performance of different indexes for spatial data. We've
> >built a system that uses Postgres and one of the data structures we want
> to
> >use is a loose quadtree, but there is no implementation of this data
> >structure in spgist. The reason why I think this is pretty do-able is that
> >it is quite similar to a quadtree on boxes, which is implemented in
> >src/backend/utils/adt/geo_spgist.c.
> >
> >Additionally, I found by grepping through the repo for the existing
> >functions in spgist/box_ops operator class that several catalog files need
> >to be updated to reflect a new operator class in spgist. The files that I
> >believe need to be changed to create a new
> >spgist_loose_box_ops operator class are:
> >
> >src/include/catalog/pg_amop.dat
> >src/include/catalog/pg_amproc.dat
> >src/include/catalog/pg_opclass.dat
> >src/include/catalog/pg_opfamily.dat
> >
>
> You should probably try using CREATE OPERATOR CLASS command [1], not
> modify the catalogs directly. That's only necessary for built-in index
> types (i.e. available right after initdb). But you mentioned you're
> working on an extension, so the command is the right thing to do (after
> all, you don't know OIDs of objects from the extension).
>
> [1] https://www.postgresql.org/docs/current/sql-createopclass.html
>
> >
> >I've poked around quite a bit in the spgist code and have tried making
> >minimal changes to geo_spgist.c, but I haven't done any development on
> >postgres before, so i'm running into some issues that I couldn't find help
> >with on the postgres slack, by searching the mailing list, or by scouring
> >the development wikis.
>
> Well, learning the ropes may take a bit of time, and pgsql-hackers is
> probably the right place to ask ...
>
> >For example, I wanted to just print out some data to
> >see what quadrant a box is being placed into in the geo_spgist.c code. I
> >understand that printing to stdout won't work in postgres, but I thought
> >that I could possibly write some data to the logfile. I tried updating a
> >function to use both elog and ereport and re-built the code. However, I
> >can't get anything to print out to the logfile no matter what I try. Does
> >anyone have tips for printing out and debugging in general for postgres
> >development?
> >
>
> Well, elog/ereport are the easiest approach (it's what I'd do), and they
> do about the same thing. The main difference is that ereport allows
> translations of messages to other languages, while elog is for internal
> things that should not happen (unexpected errors, ...). For debugging
> just use elog(), I guess.
>
> It's hard to say why you're not getting anything logged, because you
> haven't shown us any code. My guess is that you're uring log level that
> is not high enough to make it into the log file.
>
> The default config in postgresql.conf says
>
> log_min_messages = warning
>
> which means the level has to be at least WARNING to make it into the
> file. So either WARNING, ERROR, LOG, FATAL, PANIC. So for example
>
> elog(INFO, "test message");
>
> won't do anything, but
>
> elog(LOG, "test message");
>
> will write stuff to the log file. If you use WARNING, you'll actually
> get the message on the client console (well, there's client_min_messages
> but you get the idea).
>
> >
> >Any tips or guidance would be much appreciated. Also, if there's a
> >different route I should go to turn this into a proposal for a patch
> >please let me know. I'm new to postgres dev.
> >
>
> A general recommendation is to show snippets of code, so that people on
> this list actually can help without too much guessing what you're doing.
>
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
--
Peter Griggs
Masters of Engineering (Meng) in Computer Science
Massachusetts Institute of Technology | 2020
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Peter Griggs <petergriggs33(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [QUESTION/PROPOSAL] loose quadtree in spgist |
Date: | 2020-01-08 20:07:34 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Peter Griggs <petergriggs33(at)gmail(dot)com> writes:
> In the getQuadrant function in the file src/backend/utils/adt/geo_spgist.c,
> I only added some elog statements to see the quadrant that a box is placed
> into using the current code. getQuadrant is called several times by the
> spg_box_quad_picksplit function, which is used when inserting into the
> quadtree. With this change, I can still build postgres but when I try to
> trigger the code, nothing gets printed to my logfile.
Perhaps you're looking in the wrong logfile. elog(LOG) should definitely
produce output unless you're using very strange settings.
Another possibility is that the specific test case you're using doesn't
actually reach this function. I'm not totally sure, but I think that
SPGiST might not call the datatype-specific choose or picksplit functions
until it's got more than one index page's worth of data.
> elog(LOG, "BOX (minx, miny) = (%d, %d)\n", centroid->low.x, centroid->low.y);
A couple thoughts here:
* From memory, the x and y values of a BOX are float8, so don't you want
to use %g or %f instead of %d?
* You don't want to end an elog with \n, that'll just add extra blank
lines to the log.
regards, tom lane
From: | Peter Griggs <petergriggs33(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [QUESTION/PROPOSAL] loose quadtree in spgist |
Date: | 2020-01-16 05:32:34 |
Message-ID: | CACEwj4rq3X2PjV5_6gveAhj0QSso=C64kEwprzg--Vq7OCf0nw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
As an update, you were totally right Tom, SPGIST loads all of the tuples
onto the root page which doesn't call picksplit until there's a page's
worth of data. I just wasn't inserting enough tuples to see my elog values
appear in the log, but now I am and they do!
The hint from Tomas before to use the CREATE OPERATOR CLASS command was
spot on. That documentation lead me to this page (
https://www.postgresql.org/docs/11/xindex.html) which looked like the sql
I need to include in the extension to get the new loose quadtree index to
build. What I did was create a "loose_quadtree" folder for my extension in
the /contrib folder and followed the format of another extension by
including a Makefile, loose_quadtree.control file, loose_quadtree.sql, and
my loose_quadtree.c file, in which I just put copies of the box quadtree
functions from geo_spgist.c with a bit of extra logging code. Now, I can
build the extension with make and then run 'CREATE EXTENSION
loose_quadtree;', then index using 'spgist(b loose_quadtree_ops)' operator
class! Now, I have to actually figure out how to change the logic within
the functions from geo_spgist.c.
I was wondering if you had advice on how to handle implementing insertion
for the loose quadtree index. For some background, a loose quadtree is
similar to a quadtree over boxes, except that the length of a side becomes
k*L where k>1. Throughout this, I assume that our space is a square (take
the minimum bounding square of all of the boxes). Usually, a value of K=2
is used. Since, each loose quadtree cell is 2x its normal size, a given
level can hold any object that has a radius of <=1/4 of the cell side
length, regardless of the object's position. We can do a bit of math and
figure out what level an object should be inserted into the tree in O(1)
time. I'm including a picture of the level selection algorithm below, but
its just a re-formulation of what i've said above. My overall question is
how to do this in spgist. From what I understand in the spgist insertion
algorithm, the level selection would should done in the choose() function
because choose() is called when we are trying to insert a leaf tuple into a
inner tuple that has one or more levels under it. Currently, it seems like
the spg_box_quad_choose() function descends recursively into a quadtree
node. What I would like to do is to have it jump straight to the level it
wants to insert into using the loose quadtree level selection algorithm and
then find which cell it should add to by comparing its center coordinates.
[Following image from: Thatcher Ulrich. Loose octrees. In Mark DeLoura,
editor, Game Programming Gems, pages 444–453. Charles River Media, 2000]
[image: Screen Shot 2020-01-14 at 6.15.09 PM.png]
Best,
Peter
On Wed, Jan 8, 2020 at 3:07 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Peter Griggs <petergriggs33(at)gmail(dot)com> writes:
> > In the getQuadrant function in the file
> src/backend/utils/adt/geo_spgist.c,
> > I only added some elog statements to see the quadrant that a box is
> placed
> > into using the current code. getQuadrant is called several times by the
> > spg_box_quad_picksplit function, which is used when inserting into the
> > quadtree. With this change, I can still build postgres but when I try to
> > trigger the code, nothing gets printed to my logfile.
>
> Perhaps you're looking in the wrong logfile. elog(LOG) should definitely
> produce output unless you're using very strange settings.
>
> Another possibility is that the specific test case you're using doesn't
> actually reach this function. I'm not totally sure, but I think that
> SPGiST might not call the datatype-specific choose or picksplit functions
> until it's got more than one index page's worth of data.
>
> > elog(LOG, "BOX (minx, miny) = (%d, %d)\n", centroid->low.x,
> centroid->low.y);
>
> A couple thoughts here:
>
> * From memory, the x and y values of a BOX are float8, so don't you want
> to use %g or %f instead of %d?
>
> * You don't want to end an elog with \n, that'll just add extra blank
> lines to the log.
>
> regards, tom lane
>
--
Peter Griggs
Masters of Engineering (Meng) in Computer Science
Massachusetts Institute of Technology | 2020
From: | Peter Griggs <petergriggs33(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [QUESTION/PROPOSAL] loose quadtree in spgist |
Date: | 2020-01-28 06:23:33 |
Message-ID: | CACEwj4q91T-Z3Cc=9-Z3EwnPSacvAponrwX3tBG_6aefKFTrJw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi, I was wondering if someone could help me understand what the
"allTheSame" attribute in the spgChooseIn struct is.
Does it mean that the inner tuple contains either only inner tuples or only
leaf nodes? Or is it saying that the tuples in an inner tuple are all in
the same quadrant?
This is a code snippet from /src/include/access/spgist.h:
/*
* Argument structs for spg_choose method
*/
typedef struct spgChooseIn
{
Datum datum; /* original datum to be indexed */
Datum leafDatum; /* current datum to be stored at leaf */
int level; /* current level (counting from zero) */
/* Data from current inner tuple */
bool allTheSame; /* tuple is marked all-the-same? */
bool hasPrefix; /* tuple has a prefix? */
Datum prefixDatum; /* if so, the prefix value */
int nNodes; /* number of nodes in the inner tuple */
Datum *nodeLabels; /* node label values (NULL if none) */
} spgChooseIn;
On Thu, Jan 16, 2020 at 12:32 AM Peter Griggs <petergriggs33(at)gmail(dot)com>
wrote:
> As an update, you were totally right Tom, SPGIST loads all of the tuples
> onto the root page which doesn't call picksplit until there's a page's
> worth of data. I just wasn't inserting enough tuples to see my elog values
> appear in the log, but now I am and they do!
>
> The hint from Tomas before to use the CREATE OPERATOR CLASS command was
> spot on. That documentation lead me to this page (
> https://www.postgresql.org/docs/11/xindex.html) which looked like the
> sql I need to include in the extension to get the new loose quadtree index
> to build. What I did was create a "loose_quadtree" folder for my extension
> in the /contrib folder and followed the format of another extension by
> including a Makefile, loose_quadtree.control file, loose_quadtree.sql, and
> my loose_quadtree.c file, in which I just put copies of the box quadtree
> functions from geo_spgist.c with a bit of extra logging code. Now, I can
> build the extension with make and then run 'CREATE EXTENSION
> loose_quadtree;', then index using 'spgist(b loose_quadtree_ops)' operator
> class! Now, I have to actually figure out how to change the logic within
> the functions from geo_spgist.c.
>
> I was wondering if you had advice on how to handle implementing insertion
> for the loose quadtree index. For some background, a loose quadtree is
> similar to a quadtree over boxes, except that the length of a side becomes
> k*L where k>1. Throughout this, I assume that our space is a square (take
> the minimum bounding square of all of the boxes). Usually, a value of K=2
> is used. Since, each loose quadtree cell is 2x its normal size, a given
> level can hold any object that has a radius of <=1/4 of the cell side
> length, regardless of the object's position. We can do a bit of math and
> figure out what level an object should be inserted into the tree in O(1)
> time. I'm including a picture of the level selection algorithm below, but
> its just a re-formulation of what i've said above. My overall question is
> how to do this in spgist. From what I understand in the spgist insertion
> algorithm, the level selection would should done in the choose() function
> because choose() is called when we are trying to insert a leaf tuple into a
> inner tuple that has one or more levels under it. Currently, it seems like
> the spg_box_quad_choose() function descends recursively into a quadtree
> node. What I would like to do is to have it jump straight to the level it
> wants to insert into using the loose quadtree level selection algorithm and
> then find which cell it should add to by comparing its center coordinates.
>
> [Following image from: Thatcher Ulrich. Loose octrees. In Mark DeLoura,
> editor, Game Programming Gems, pages 444–453. Charles River Media, 2000]
>
> [image: Screen Shot 2020-01-14 at 6.15.09 PM.png]
>
> Best,
> Peter
>
>
>
>
>
> On Wed, Jan 8, 2020 at 3:07 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Peter Griggs <petergriggs33(at)gmail(dot)com> writes:
>> > In the getQuadrant function in the file
>> src/backend/utils/adt/geo_spgist.c,
>> > I only added some elog statements to see the quadrant that a box is
>> placed
>> > into using the current code. getQuadrant is called several times by the
>> > spg_box_quad_picksplit function, which is used when inserting into the
>> > quadtree. With this change, I can still build postgres but when I try to
>> > trigger the code, nothing gets printed to my logfile.
>>
>> Perhaps you're looking in the wrong logfile. elog(LOG) should definitely
>> produce output unless you're using very strange settings.
>>
>> Another possibility is that the specific test case you're using doesn't
>> actually reach this function. I'm not totally sure, but I think that
>> SPGiST might not call the datatype-specific choose or picksplit functions
>> until it's got more than one index page's worth of data.
>>
>> > elog(LOG, "BOX (minx, miny) = (%d, %d)\n", centroid->low.x,
>> centroid->low.y);
>>
>> A couple thoughts here:
>>
>> * From memory, the x and y values of a BOX are float8, so don't you want
>> to use %g or %f instead of %d?
>>
>> * You don't want to end an elog with \n, that'll just add extra blank
>> lines to the log.
>>
>> regards, tom lane
>>
>
>
> --
> Peter Griggs
> Masters of Engineering (Meng) in Computer Science
> Massachusetts Institute of Technology | 2020
>
--
Peter Griggs
Masters of Engineering (Meng) in Computer Science
Massachusetts Institute of Technology | 2020