FOP - Dev

Initial soft hyphen support

9 Messages :: Rating: :: View: Alert me of new posts

Initial soft hyphen support

by Manuel Mall :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Just committed the initial support for the soft hyphen.

As we had two in favour of having the SHY always produce a break
opportunity and only one against that's the route I took.

I had no luck with giving the SHY a reduced penalty and have the Knuth
algorithm favour them before normal hyphenation breaks. Even with a
penalty value of 1 fop still chooses the hyphenation break with a
penalty of 50. Either I do something wrong or I misunderstand how the
Knuth breaking calculation is suppose to work. May be one of the Knuth
experts can have a look at this PLEASE.

Also not correctly working (yet) is ipd calculation when kerning and a
SHY break is involved. But may be that's a more general issue.

For those looking closer at the commit the area handling within the text
layout manager has changed a bit. Before this patch the assumption was
made that the sequence of characters given to the LM will be fully
output to the area tree. Now we have for the first time the case that
characters (the SHY) can be dropped. This led to changes with respect
to certain indexing loops.

Manuel

Re: Initial soft hyphen support

by Andreas L Delmelle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jan 13, 2007, at 10:31, Manuel Mall wrote:

Hi Manuel,

> Just committed the initial support for the soft hyphen.

Nice job, thanks!

> As we had two in favour of having the SHY always produce a break
> opportunity and only one against that's the route I took.
>
> I had no luck with giving the SHY a reduced penalty and have the Knuth
> algorithm favour them before normal hyphenation breaks. Even with a
> penalty value of 1 fop still chooses the hyphenation break with a
> penalty of 50. Either I do something wrong or I misunderstand how the
> Knuth breaking calculation is suppose to work. May be one of the Knuth
> experts can have a look at this PLEASE.

Well, I'm still not really an expert, but as I'm beginning to  
understand more and more, what you altered was the base Knuth element  
generation, right?

IIUC, a possible solution may be to treat SHY as special *only* if  
hyphenation is turned off.
The reasoning being that, if hyphenate is true, then handling the SHY  
becomes the hyphenator's job. The SHY character will be presented to  
the hyphenator simply as a character of the word it appears in. The  
hyphenator should then be smart enough to recognize this as a special  
character, and do something like: create a hyphenation point for the  
SHY, and try to hyphenate the parts before and after the SHY as  
separate words...


HTH!

Andreas

Re: Initial soft hyphen support

by J.Pietschmann :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andreas L Delmelle wrote:
> The SHY character will be presented to the
> hyphenator simply as a character of the word it appears in. The
> hyphenator should then be smart enough to recognize this as a special
> character, and do something like: create a hyphenation point for the
> SHY, ...

Unfortunately, the hyphenator currently isn't as nearly as smart,
and it's a major job to push it in this direction. E.g. it means
major API changes.

J.Pietschmann

Re: Initial soft hyphen support

by Andreas L Delmelle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jan 14, 2007, at 23:11, J.Pietschmann wrote:

> Andreas L Delmelle wrote:
>> The SHY character will be presented to the hyphenator simply as a  
>> character of the word it appears in. The hyphenator should then be  
>> smart enough to recognize this as a special character, and do  
>> something like: create a hyphenation point for the SHY, ...
>
> Unfortunately, the hyphenator currently isn't as nearly as smart,
> and it's a major job to push it in this direction. E.g. it means
> major API changes.

Unfortunate indeed :(

BTW: I took a very quick look, and does anyone know if there is a  
good reason why Hyphenation.word is a String? I mean, everything that  
comes from FOText and passes through TextLM is already char[]. The  
Hyphenation constructor takes a String parameter, so I guess  
somewhere --haven't looked yet-- a String is constructed from the  
portion of char[] that is to be hyphenated. If you then look at  
HyphenationTree, it says word.toCharArray()...


Cheers,

Andreas


Re: Initial soft hyphen support

by J.Pietschmann :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andreas L Delmelle wrote:
> BTW: I took a very quick look, and does anyone know if there is a good
> reason why Hyphenation.word is a String?

The hyphenator  interface goes through several wrapping layers,
probably due to the usual "take working code and wrap it to fit
the caller" method.
This which always seemed to be overly complicated for me. I tried
to come up with a comprehensive API for hyphenation (which would
also be applicable to spelling and other similar tasks). Unfortunately,
there doesn't seem to be any usable standard, all APIs I've seen
are very specific or simply horrible. Any simplification is certainly
welcome.

J.Pietschmann

Re: Initial soft hyphen support

by Andreas L Delmelle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jan 15, 2007, at 21:25, J.Pietschmann wrote:

> Andreas L Delmelle wrote:
>> BTW: I took a very quick look, and does anyone know if there is a  
>> good reason why Hyphenation.word is a String?
>
> The hyphenator  interface goes through several wrapping layers,
> probably due to the usual "take working code and wrap it to fit
> the caller" method.

Looks that way...
Traced it down, and in TextLM.getWordChars() we get

   sbChars.append(new String(textArray, ai.iStartIndex,
                           ai.iBreakIndex - ai.iStartIndex));


Not really sure what would be most efficient:
- a void method appending to a parameter StringBuffer
- a method returning a copy of the char[] from index to index...

Seen that every String ultimately has a backing char[](*) anyway, I'd  
say that we can safely return the copy, and remove the overhead of

StringBuffer.append(new String(char[])).toString().toCharArray()

Hmmm... Put it like that, and this would almost be one for the Daily  
WTF! 8-)

(*) which BTW, answers the question about the char[] instances being  
twice that of the text-nodes in the document in the snapshot posted  
by Richard earlier on in the thread about memory issues. Sure, there  
are some 39K text-nodes in the document, but there are most likely at  
least as many non-internalized property values (cfr. the number of  
String instances)...

> This which always seemed to be overly complicated for me. I tried
> to come up with a comprehensive API for hyphenation (which would
> also be applicable to spelling and other similar tasks).  
> Unfortunately,
> there doesn't seem to be any usable standard, all APIs I've seen
> are very specific or simply horrible. Any simplification is certainly
> welcome.

A quick-and-dirty hack to make the Hyphenator return a Hyphenation as  
I described earlier on --hyph-point for the SHY and the rest as two  
separate hyphenated words-- doesn't seem too hard to pull off, but it  
would be an exception for the SHY only. For a more comprehensive  
approach, I currently don't know enough about hyphenation basics, I'm  
afraid...


Cheers,

Andreas

Re: Initial soft hyphen support

by Andreas L Delmelle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jan 15, 2007, at 22:20, Andreas L Delmelle wrote:

> <snip />

> Not really sure what would be most efficient:
> - a void method appending to a parameter StringBuffer
> - a method returning a copy of the char[] from index to index...
>
> Seen that every String ultimately has a backing char[](*) anyway,  
> I'd say that we can safely return the copy, and remove the overhead of
>
> StringBuffer.append(new String(char[])).toString().toCharArray()

Looked a bit deeper, and there is apparently a good reason to use a  
StringBuffer: the char[] from one FOText might need to be appended to  
that of a previous one (see TextLM.findHyphenationPoints()).

I guess it would be a bad idea to replace this with arrays, since  
they're not so straightforward to concatenate (requires copying into  
a new array).

Too bad we're still targeting 1.3, else we might consider switching  
to a java.nio.CharBuffer...


Cheers,

Andreas


Re: Initial soft hyphen support

by Vincent Hennebert-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andreas L Delmelle a écrit :
> On Jan 15, 2007, at 22:20, Andreas L Delmelle wrote:
>
>> <snip />
>
>> Not really sure what would be most efficient:
>> - a void method appending to a parameter StringBuffer
>> - a method returning a copy of the char[] from index to index...
>>
>> Seen that every String ultimately has a backing char[](*) anyway, I'd
>> say that we can safely return the copy, and remove the overhead of
>>
>> StringBuffer.append(new String(char[])).toString().toCharArray()
>
> Looked a bit deeper, and there is apparently a good reason to use a
> StringBuffer: the char[] from one FOText might need to be appended to
> that of a previous one (see TextLM.findHyphenationPoints()).
>
> I guess it would be a bad idea to replace this with arrays, since
> they're not so straightforward to concatenate (requires copying into a
> new array).
>
> Too bad we're still targeting 1.3, else we might consider switching to a
> java.nio.CharBuffer...

Hadn't we agreed upon raising the minimum Java version to 1.4? Or at
least make a poll on fop-user to see if that would create any problem.
If that'd depend only on me, we would already be using all the Java 1.5
nice features ;-)

Vincent

Re: Initial soft hyphen support

by Clay Leeds :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jan 16, 2007, at 12:25 PM, Vincent Hennebert wrote:
> Hadn't we agreed upon raising the minimum Java version to 1.4? Or at
> least make a poll on fop-user to see if that would create any problem.
> If that'd depend only on me, we would already be using all the Java  
> 1.5
> nice features ;-)
>
> Vincent

As I recall, we're targeting JDK 1.4 for 0.93+, and leave the JDK 1.3  
for 0.20.5 (since it's not changing anyway). I think the thought was  
that anyone who needs to stay on JDK 1.3 (AIX 4.x & others locked  
into IBM Java 1.3, etc.) can continue using fop-0.20.5.

Web Maestro Clay