Re: <indexterm> with <secondary>

Lists: pgsql-docs
From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Pg Docs <pgsql-docs(at)postgresql(dot)org>
Subject: <indexterm> with <secondary>
Date: 2017-03-15 16:05:25
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-docs

While reviewing Tomas' extended statistics patch I noticed that the new
docbook toolchain produces additional links for each indexterm, based on
the <secondary> tags there are. For instance, in 9.5 I see this:

statistics, Aggregate Functions, The Statistics Collector
of the planner, Statistics Used by the Planner, Updating Planner Statistics
https://www.postgresql.org/docs/9.5/static/bookindex.html#AEN186176

while in 10 I see this for the same source:

statistics, Aggregate Functions, Statistics Used by the Planner, Updating Planner Statistics, The Statistics Collector
of the planner, Statistics Used by the Planner, Updating Planner Statistics
https://www.postgresql.org/docs/devel/static/bookindex.html#indexdiv-S

Note that the links for entries on the <secondary>of the planner</> also
show up in the first list of links.

Is this intended? As we get more <secondary> entries, this gets busy
real quick, for no gain that I can see. For instance, I added this
entry:

<indexterm zone="extended-statistics">
<primary>statistics</primary>
<secondary>of the planner</secondary>
<tertiary>extended</tertiary>
</indexterm>

and this results in:

statistics, Aggregate Functions, Statistics Used by the Planner, Updating Planner Statistics, The Statistics Collector, Extended Statistics
of the planner, Statistics Used by the Planner, Updating Planner Statistics, Extended Statistics
extended, Extended Statistics

which seems altogether excessive. Perhaps this is a bug in the index
generation?

FWIW I'm leaning towards removing the <tertiary> in the new entry, which
results in this:

statistics, Aggregate Functions, Statistics Used by the Planner, Updating Planner Statistics, The Statistics Collector, Extended Statistics
of the planner, Statistics Used by the Planner, Updating Planner Statistics, Extended Statistics

I think this would be better:

statistics, Aggregate Functions, The Statistics Collector
of the planner, Statistics Used by the Planner, Updating Planner Statistics, Extended Statistics

--
Álvaro Herrera Developer, https://www.PostgreSQL.org/
Licensee shall have no right to use the Licensed Software
for productive or commercial use. (Licencia de StarOffice 6.0 beta)


From: Alexander Law <exclusion(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Docs <pgsql-docs(at)postgresql(dot)org>
Subject: Re: <indexterm> with <secondary>
Date: 2017-03-16 14:44:01
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-docs

Hello Alvaro,

These duplicate entries caused by the zone attribute. If you'll remove
it, you'll get only single entry (in secondary line).
It seems that the following DocBook bugfix introduced the bug you
encountered:
https://github.com/docbook/xslt10-stylesheets/commit/f555842b

I see the following solutions for the issue:
1. To customize this xsl template (remove @zone check) for our docs.
2. To remove @zone specification from indexterm's.
Are there any reasons to specify it explicitly?
<sect2 id="vacuum-for-statistics">
<title>Updating Planner Statistics</title>

<indexterm zone="vacuum-for-statistics">
<primary>statistics</primary>
<secondary>of the planner</secondary>
</indexterm>
(DocBook documentation <http://tdg.docbook.org/tdg/4.5/indexterm.html>
says:
/|Zone|//holds the IDs of the elements to which it applies./
So here "zone" specifies a single point. But to mark a single point we
don't need to specify a zone.
/A single point is marked with an //|IndexTerm|
<http://tdg.docbook.org/tdg/4.5/indexterm.html>//placed in the text at
the point of reference./)
(We have about 800 "zone" specifications, but as the issue arises only
with <secondary>, it decreases to about 160 entries.)
3. To report an issue to docbook and get some feedback (but I'm afraid
it can take months).

Best regards,
Alexander

15.03.2017 19:05, Alvaro Herrera wrote:
> While reviewing Tomas' extended statistics patch I noticed that the new
> docbook toolchain produces additional links for each indexterm, based on
> the <secondary> tags there are. For instance, in 9.5 I see this:
>
> statistics, Aggregate Functions, The Statistics Collector
> of the planner, Statistics Used by the Planner, Updating Planner Statistics
> https://www.postgresql.org/docs/9.5/static/bookindex.html#AEN186176
>
> while in 10 I see this for the same source:
>
> statistics, Aggregate Functions, Statistics Used by the Planner, Updating Planner Statistics, The Statistics Collector
> of the planner, Statistics Used by the Planner, Updating Planner Statistics
> https://www.postgresql.org/docs/devel/static/bookindex.html#indexdiv-S
>
> Note that the links for entries on the <secondary>of the planner</> also
> show up in the first list of links.
>
> Is this intended? As we get more <secondary> entries, this gets busy
> real quick, for no gain that I can see. For instance, I added this
> entry:
>
> <indexterm zone="extended-statistics">
> <primary>statistics</primary>
> <secondary>of the planner</secondary>
> <tertiary>extended</tertiary>
> </indexterm>
>
> and this results in:
>
> statistics, Aggregate Functions, Statistics Used by the Planner, Updating Planner Statistics, The Statistics Collector, Extended Statistics
> of the planner, Statistics Used by the Planner, Updating Planner Statistics, Extended Statistics
> extended, Extended Statistics
>
> which seems altogether excessive. Perhaps this is a bug in the index
> generation?
>
>
> FWIW I'm leaning towards removing the <tertiary> in the new entry, which
> results in this:
>
> statistics, Aggregate Functions, Statistics Used by the Planner, Updating Planner Statistics, The Statistics Collector, Extended Statistics
> of the planner, Statistics Used by the Planner, Updating Planner Statistics, Extended Statistics
>
> I think this would be better:
>
> statistics, Aggregate Functions, The Statistics Collector
> of the planner, Statistics Used by the Planner, Updating Planner Statistics, Extended Statistics
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Law <exclusion(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Docs <pgsql-docs(at)postgresql(dot)org>
Subject: Re: <indexterm> with <secondary>
Date: 2017-10-25 20:23:20
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-docs

Alexander Law <exclusion(at)gmail(dot)com> writes:
> These duplicate entries caused by the zone attribute. If you'll remove
> it, you'll get only single entry (in secondary line).
> It seems that the following DocBook bugfix introduced the bug you
> encountered:
> https://github.com/docbook/xslt10-stylesheets/commit/f555842b

> I see the following solutions for the issue:
> 1. To customize this xsl template (remove @zone check) for our docs.
> 2. To remove @zone specification from indexterm's.
> Are there any reasons to specify it explicitly?

According to my understanding, the use of "zone" means that the
indexterm entry refers to the whole section named by the "zone" label,
not only the physical point the entry is at. So I think our usage
is correct in principle; for example, this would allow an index entry
to say something like "Section 4.2" not just "page 435". In practice,
though, the distinction seems merely pedantic: AFAICS it does not
affect either HTML output (which always gives you a hyperlink to the
containing section) or PDF output (which always gives you a page number).

So we could run around and remove the zone tags, but that still seems
like rather a grotty answer --- maybe someday we'd want them back,
if the doc toolchain were ever improved to make effective use of them.

How complicated is the "customize the xsl template" solution?

(BTW, I do not see the extra-entries bug at all in PDF output.)

regards, tom lane


From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Docs <pgsql-docs(at)postgresql(dot)org>
Subject: Re: <indexterm> with <secondary>
Date: 2017-10-26 06:40:11
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-docs

Hello Tom,
25.10.2017 23:23, Tom Lane writes:
>
>> 2. To remove @zone specification from indexterm's.
>> Are there any reasons to specify it explicitly?
> According to my understanding, the use of "zone" means that the
> indexterm entry refers to the whole section named by the "zone" label,
> not only the physical point the entry is at. So I think our usage
> is correct in principle; for example, this would allow an index entry
> to say something like "Section 4.2" not just "page 435". In practice,
> though, the distinction seems merely pedantic: AFAICS it does not
> affect either HTML output (which always gives you a hyperlink to the
> containing section) or PDF output (which always gives you a page number).
There is also the Docbook parameter "index.links.to.section" (
http://docbook.sourceforge.net/release/xsl/1.79.1/doc/html/index.links.to.section.html
)
which affects the pedanticism of the distinction.
> So we could run around and remove the zone tags, but that still seems
> like rather a grotty answer --- maybe someday we'd want them back,
> if the doc toolchain were ever improved to make effective use of them.
>
> How complicated is the "customize the xsl template" solution?
>
> (BTW, I do not see the extra-entries bug at all in PDF output.)
Yes, the issue is with html/autoidx.xsl only.
I would choose to customize xsl for now, though the corresponding
docbook templates are not very human-friendly (for xhtml/autoidx.xsl
they are autogenerated from html/autoidx). See the patch attached. (May
be we might reformat it slightly.)
The commands used to generate the patch are more simpler:

wget http://docbook.sourceforge.net/release/xsl/1.79.1/xhtml/autoidx.xsl
-O - | grep -Pzo '(?s)<xsl:template match="indexterm"
mode="index-(primary|secondary|tertiary)">(.*?)</xsl:template>\s' | sed
-e "s/\x0//" -e "s/\$refs\[(at)zone != '' or
generate-id()/\$refs[generate-id()/" >> stylesheet-html-common.xsl
perl -0777 -i -pe 's#\n</xsl:stylesheet>##' stylesheet-html-common.xsl;
echo -e '\n</xsl:stylesheet>' >>stylesheet-html-common.xsl

Nonertheless, it might be better solution as we can leave all the zone
references intact and preserve the freedom to change our mind later.

Best regards,
------
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
fix-html-autoidx.patch text/x-patch 50.9 KB