
|
Initial soft hyphen support
Just committed the initial support for the soft hyphen.
As we had two in favour of having the SHY always produce a break
opportunity and only one against that's the route I took.
I had no luck with giving the SHY a reduced penalty and have the Knuth
algorithm favour them before normal hyphenation breaks. Even with a
penalty value of 1 fop still chooses the hyphenation break with a
penalty of 50. Either I do something wrong or I misunderstand how the
Knuth breaking calculation is suppose to work. May be one of the Knuth
experts can have a look at this PLEASE.
Also not correctly working (yet) is ipd calculation when kerning and a
SHY break is involved. But may be that's a more general issue.
For those looking closer at the commit the area handling within the text
layout manager has changed a bit. Before this patch the assumption was
made that the sequence of characters given to the LM will be fully
output to the area tree. Now we have for the first time the case that
characters (the SHY) can be dropped. This led to changes with respect
to certain indexing loops.
Manuel
|

|
Re: Initial soft hyphen support
On Jan 13, 2007, at 10:31, Manuel Mall wrote:
Hi Manuel,
> Just committed the initial support for the soft hyphen.
Nice job, thanks!
> As we had two in favour of having the SHY always produce a break
> opportunity and only one against that's the route I took.
>
> I had no luck with giving the SHY a reduced penalty and have the Knuth
> algorithm favour them before normal hyphenation breaks. Even with a
> penalty value of 1 fop still chooses the hyphenation break with a
> penalty of 50. Either I do something wrong or I misunderstand how the
> Knuth breaking calculation is suppose to work. May be one of the Knuth
> experts can have a look at this PLEASE.
Well, I'm still not really an expert, but as I'm beginning to
understand more and more, what you altered was the base Knuth element
generation, right?
IIUC, a possible solution may be to treat SHY as special *only* if
hyphenation is turned off.
The reasoning being that, if hyphenate is true, then handling the SHY
becomes the hyphenator's job. The SHY character will be presented to
the hyphenator simply as a character of the word it appears in. The
hyphenator should then be smart enough to recognize this as a special
character, and do something like: create a hyphenation point for the
SHY, and try to hyphenate the parts before and after the SHY as
separate words...
HTH!
Andreas
|

|
Re: Initial soft hyphen support
Andreas L Delmelle wrote:
> The SHY character will be presented to the
> hyphenator simply as a character of the word it appears in. The
> hyphenator should then be smart enough to recognize this as a special
> character, and do something like: create a hyphenation point for the
> SHY, ...
Unfortunately, the hyphenator currently isn't as nearly as smart,
and it's a major job to push it in this direction. E.g. it means
major API changes.
J.Pietschmann
|

|
Re: Initial soft hyphen support
On Jan 14, 2007, at 23:11, J.Pietschmann wrote:
> Andreas L Delmelle wrote:
>> The SHY character will be presented to the hyphenator simply as a
>> character of the word it appears in. The hyphenator should then be
>> smart enough to recognize this as a special character, and do
>> something like: create a hyphenation point for the SHY, ...
>
> Unfortunately, the hyphenator currently isn't as nearly as smart,
> and it's a major job to push it in this direction. E.g. it means
> major API changes.
Unfortunate indeed :(
BTW: I took a very quick look, and does anyone know if there is a
good reason why Hyphenation.word is a String? I mean, everything that
comes from FOText and passes through TextLM is already char[]. The
Hyphenation constructor takes a String parameter, so I guess
somewhere --haven't looked yet-- a String is constructed from the
portion of char[] that is to be hyphenated. If you then look at
HyphenationTree, it says word.toCharArray()...
Cheers,
Andreas
|

|
Re: Initial soft hyphen support
Andreas L Delmelle wrote:
> BTW: I took a very quick look, and does anyone know if there is a good
> reason why Hyphenation.word is a String?
The hyphenator interface goes through several wrapping layers,
probably due to the usual "take working code and wrap it to fit
the caller" method.
This which always seemed to be overly complicated for me. I tried
to come up with a comprehensive API for hyphenation (which would
also be applicable to spelling and other similar tasks). Unfortunately,
there doesn't seem to be any usable standard, all APIs I've seen
are very specific or simply horrible. Any simplification is certainly
welcome.
J.Pietschmann
|

|
Re: Initial soft hyphen support
On Jan 15, 2007, at 21:25, J.Pietschmann wrote:
> Andreas L Delmelle wrote:
>> BTW: I took a very quick look, and does anyone know if there is a
>> good reason why Hyphenation.word is a String?
>
> The hyphenator interface goes through several wrapping layers,
> probably due to the usual "take working code and wrap it to fit
> the caller" method.
Looks that way...
Traced it down, and in TextLM.getWordChars() we get
sbChars.append(new String(textArray, ai.iStartIndex,
ai.iBreakIndex - ai.iStartIndex));
Not really sure what would be most efficient:
- a void method appending to a parameter StringBuffer
- a method returning a copy of the char[] from index to index...
Seen that every String ultimately has a backing char[](*) anyway, I'd
say that we can safely return the copy, and remove the overhead of
StringBuffer.append(new String(char[])).toString().toCharArray()
Hmmm... Put it like that, and this would almost be one for the Daily
WTF! 8-)
(*) which BTW, answers the question about the char[] instances being
twice that of the text-nodes in the document in the snapshot posted
by Richard earlier on in the thread about memory issues. Sure, there
are some 39K text-nodes in the document, but there are most likely at
least as many non-internalized property values (cfr. the number of
String instances)...
> This which always seemed to be overly complicated for me. I tried
> to come up with a comprehensive API for hyphenation (which would
> also be applicable to spelling and other similar tasks).
> Unfortunately,
> there doesn't seem to be any usable standard, all APIs I've seen
> are very specific or simply horrible. Any simplification is certainly
> welcome.
A quick-and-dirty hack to make the Hyphenator return a Hyphenation as
I described earlier on --hyph-point for the SHY and the rest as two
separate hyphenated words-- doesn't seem too hard to pull off, but it
would be an exception for the SHY only. For a more comprehensive
approach, I currently don't know enough about hyphenation basics, I'm
afraid...
Cheers,
Andreas
|

|
Re: Initial soft hyphen support
On Jan 15, 2007, at 22:20, Andreas L Delmelle wrote:
> <snip />
> Not really sure what would be most efficient:
> - a void method appending to a parameter StringBuffer
> - a method returning a copy of the char[] from index to index...
>
> Seen that every String ultimately has a backing char[](*) anyway,
> I'd say that we can safely return the copy, and remove the overhead of
>
> StringBuffer.append(new String(char[])).toString().toCharArray()
Looked a bit deeper, and there is apparently a good reason to use a
StringBuffer: the char[] from one FOText might need to be appended to
that of a previous one (see TextLM.findHyphenationPoints()).
I guess it would be a bad idea to replace this with arrays, since
they're not so straightforward to concatenate (requires copying into
a new array).
Too bad we're still targeting 1.3, else we might consider switching
to a java.nio.CharBuffer...
Cheers,
Andreas
|

|
Re: Initial soft hyphen support
Andreas L Delmelle a écrit :
> On Jan 15, 2007, at 22:20, Andreas L Delmelle wrote:
>
>> <snip />
>
>> Not really sure what would be most efficient:
>> - a void method appending to a parameter StringBuffer
>> - a method returning a copy of the char[] from index to index...
>>
>> Seen that every String ultimately has a backing char[](*) anyway, I'd
>> say that we can safely return the copy, and remove the overhead of
>>
>> StringBuffer.append(new String(char[])).toString().toCharArray()
>
> Looked a bit deeper, and there is apparently a good reason to use a
> StringBuffer: the char[] from one FOText might need to be appended to
> that of a previous one (see TextLM.findHyphenationPoints()).
>
> I guess it would be a bad idea to replace this with arrays, since
> they're not so straightforward to concatenate (requires copying into a
> new array).
>
> Too bad we're still targeting 1.3, else we might consider switching to a
> java.nio.CharBuffer...
Hadn't we agreed upon raising the minimum Java version to 1.4? Or at
least make a poll on fop-user to see if that would create any problem.
If that'd depend only on me, we would already be using all the Java 1.5
nice features ;-)
Vincent
|

|
Re: Initial soft hyphen support
On Jan 16, 2007, at 12:25 PM, Vincent Hennebert wrote:
> Hadn't we agreed upon raising the minimum Java version to 1.4? Or at
> least make a poll on fop-user to see if that would create any problem.
> If that'd depend only on me, we would already be using all the Java
> 1.5
> nice features ;-)
>
> Vincent
As I recall, we're targeting JDK 1.4 for 0.93+, and leave the JDK 1.3
for 0.20.5 (since it's not changing anyway). I think the thought was
that anyone who needs to stay on JDK 1.3 (AIX 4.x & others locked
into IBM Java 1.3, etc.) can continue using fop-0.20.5.
Web Maestro Clay
|