From: champion.is.acmilan@... Date: 2015-12-22T08:16:37+00:00 Subject: [ruby-dev:49454] [Ruby trunk - Bug #11859] [Open] Regexp matching with \p{Upper} and \p{Lower} for EUC-JP doesn’t work. Issue #11859 has been reported by Kimihito Matsui. ---------------------------------------- Bug #11859: Regexp matching with \p{Upper} and \p{Lower} for EUC-JP doesn’t work. https://bugs.ruby-lang.org/issues/11859 * Author: Kimihito Matsui * Status: Open * Priority: Normal * Assignee: * ruby -v: ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14] * Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN ---------------------------------------- U+FF21 (A, FULLWIDTH LATIN CAPITAL LETTER A) and U+00c0 (À, LATIN CAPITAL LETTER A WITH GRAVE) is @Uppercase_Letter@ so it should be match and return 0 in following case but this returns 1.
ruby -e 'puts "\uFF21A".encode("EUC-JP") =~ Regexp.compile("\\\p{Upper}".encode("EUC-JP”))' # => 1 ruby -e 'puts "\u00C0A".encode("EUC-JP") =~ Regexp.compile("\\\p{Upper}".encode("EUC-JP"))’ # => 1This also happens in lower case matching.
ruby -e 'puts "\uFF41a".encode("EUC-JP") =~ Regexp.compile("\\\p{Lower}".encode("EUC-JP"))’ #=> 1In Unicode encoding it works as follows.
ruby -e 'puts "\uFF21A" =~ Regexp.compile("\\\p{Upper}")' # => 0Looks like EUC-JP @\p{Upper}@ and @\p{Lower}@ regex is limited to ASCII characters. -- https://bugs.ruby-lang.org/