From: "shyouhei (Shyouhei Urabe)" Date: 2012-11-03T04:18:16+09:00 Subject: [ruby-core:48766] [ruby-trunk - Bug #7267] Dir.glob on Mac OS X returns unexpected string encodings for unicode file names Issue #7267 has been updated by shyouhei (Shyouhei Urabe). meta (mathew murphy) wrote: > Seems to me Ruby should pick one of the standard normalization forms No, Ruby's M17N design is that it never picks one standard form of strings. Strings have various encodings and it's you, not ruby, who should pick a right thing. I admit this is not the one only solution for this mess. Other langages like Perl have different opinions. But the way ruby works is this. So please, do it for yourself. And I admit it should be much more easier for you to do the choice. We need some brush-up. ---------------------------------------- Bug #7267: Dir.glob on Mac OS X returns unexpected string encodings for unicode file names https://bugs.ruby-lang.org/issues/7267#change-32249 Author: kennygrant (Kenny Grant) Status: Open Priority: Normal Assignee: Category: Target version: 2.0.0 ruby -v: ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.0] Tested on Ruby 1.9.3-p194 and ruby-2.0.0-preview1 on Mac OS X 10. 7.5 When calling file system methods with Ruby on Mac OS X, it is not possible to manipulate the resulting file name as a normal UTF-8 string, even though it reports the encoding as UTF-8. It seems to be a UTF-8-MAC string, even when the default encoding is set to UTF-8. This leads to confusion as the string can be manipulated normally except for any unicode characters, which seem to be decomposed. So a regexp using utf-8 characters won't work on the string, unless it is first converted from UTF-8-MAC. I'd expect the string encoding to be UTF-8, or at least to report that it is not a normal UTF-8 string if it has to be UTF-8-MAC for some reason. Example, run with a file called Test��.txt in the same folder: def transform_string s puts "Testing string #{s}" puts s.gsub(/��/,'TEST') end Dir.glob("./*.txt").each do |f| puts "Inline string works as expected" s = "./Test��.txt" puts transform_string s puts "File name from Dir.glob does not" puts transform_string f puts "Encoded file name works as expected, though it is reported as UTF-8, not UTF-8-MAC" f.encode!('UTF-8','UTF-8-MAC') puts transform_string f end -- http://bugs.ruby-lang.org/