From: "MartinBosslet (Martin Bosslet)" Date: 2013-04-19T01:44:35+09:00 Subject: [ruby-core:54433] [ruby-trunk - Bug #8286] Can't decode non-MIME Base64 Issue #8286 has been updated by MartinBosslet (Martin Bosslet). Excuses for the shameless plug, but I thought it might help Alan: In krypt[1], we follow the lenient parsing/strict encoding principle. require 'krypt' decoded1 = Krypt::Base64.decode("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4") decoded2 = Krypt::Base64.decode("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4=") decoded3 = Krypt::Base64.decode("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC5=") puts decoded1 puts decoded2 puts decoded3 puts decoded1 == decoded2 # => true puts decoded2 == decoded3 # => true Even if the input is not strictly by the (RFC) book, it will still try to make sense of the input. This is possible because of how Base64 decoding works internally, it is possible to flip some bits and still get the correct answer - some of the input bits are simply irrelevant to the decoding process. When encoding however, it will always produce the canonical form. By default, it won't generate any line breaks, but you may tell it to produce line breaks after every n-th character by passing n as an optional second argument: plain_text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit." p Krypt::Base64.encode(plain_text) # with the '=' at the end p Krypt::Base64.encode(plain_text, 4) # produces \r\n after every fourth character If you are dealing with large inputs, there is also a streaming version[2] for encoding and decoding. [1] https://github.com/krypt/krypt [2] https://github.com/krypt/krypt/blob/master/lib/krypt/codec/base64.rb ---------------------------------------- Bug #8286: Can't decode non-MIME Base64 https://bugs.ruby-lang.org/issues/8286#change-38711 Author: adacosta (Alan Da Costa) Status: Closed Priority: Normal Assignee: Category: Target version: ruby -v: 2.0.0-p0 Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN =begin In https://github.com/ruby/ruby/blob/trunk/lib/base64.rb#L42 , RFC 2045 is mentioned for encode64/decode64 support, which is the MIME RFC. I don't believe this is the correct RFC to reference, as RFC 4648 is the correct RFC for Base64. Further, RFC 4648 has an explicit section about Line Feeds in Encoded Data, http://tools.ietf.org/html/rfc4648#section-3.1 . This section states: MIME [4] is often used as a reference for base 64 encoding. However, MIME does not define "base 64" per se, but rather a "base 64 Content- Transfer-Encoding" for use within MIME. As such, MIME enforces a limit on line length of base 64-encoded data to 76 characters. MIME inherits the encoding from Privacy Enhanced Mail (PEM) [3], stating that it is "virtually identical"; however, PEM uses a line length of 64 characters. The MIME and PEM limits are both due to limits within SMTP. Implementations MUST NOT add line feeds to base-encoded data unless the specification referring to this document explicitly directs base encoders to add line feeds after a specific number of characters. In my case, I have a separate implementation that has not added line feeds to the Base64 (non-MIME) and as a result, Base64.decode64 can not decode the non-MIME encoded data. I believe this also indicates Base64#encode64 has the wrong behavior of MIME encoding Base64. I have an example of the issue at https://github.com/adacosta/base64_compatible/blob/master/test/test_coding.rb#LC25 . =end -- http://bugs.ruby-lang.org/