From: "MartinBosslet (Martin Bosslet)" <Martin.Bosslet@...>
Date: 2013-04-19T01:44:35+09:00
Subject: [ruby-core:54433] [ruby-trunk - Bug #8286] Can't decode non-MIME Base64


Issue #8286 has been updated by MartinBosslet (Martin Bosslet).


Excuses for the shameless plug, but I thought it might help Alan:

In krypt[1], we follow the lenient parsing/strict encoding principle. 

    require 'krypt'

    decoded1 = Krypt::Base64.decode("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4")
    decoded2 = Krypt::Base64.decode("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4=")
    decoded3 = Krypt::Base64.decode("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC5=")

    puts decoded1
    puts decoded2
    puts decoded3
    puts decoded1 == decoded2 # => true
    puts decoded2 == decoded3 # => true

Even if the input is not strictly by the (RFC) book, it will still try to make sense of the input.
This is possible because of how Base64 decoding works internally, it is possible to flip some bits and still get the
correct answer - some of the input bits are simply irrelevant to the decoding process.

When encoding however, it will always produce the canonical form. By default, it won't generate any line breaks,
but you may tell it to produce line breaks after every n-th character by passing n as an optional second argument:

    plain_text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
    p Krypt::Base64.encode(plain_text) # with the '=' at the end 
    p Krypt::Base64.encode(plain_text, 4) # produces \r\n after every fourth character

If you are dealing with large inputs, there is also a streaming version[2] for encoding and decoding.

[1] https://github.com/krypt/krypt
[2] https://github.com/krypt/krypt/blob/master/lib/krypt/codec/base64.rb
----------------------------------------
Bug #8286: Can't decode non-MIME Base64
https://bugs.ruby-lang.org/issues/8286#change-38711

Author: adacosta (Alan Da Costa)
Status: Closed
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.0.0-p0
Backport: 1.9.3: UNKNOWN, 2.0.0: UNKNOWN


=begin
In https://github.com/ruby/ruby/blob/trunk/lib/base64.rb#L42 , RFC 2045 is mentioned for encode64/decode64 support, which is the MIME RFC. I don't believe this is the correct RFC to reference, as RFC 4648 is the correct RFC for Base64. Further, RFC 4648 has an explicit section about Line Feeds in Encoded Data, http://tools.ietf.org/html/rfc4648#section-3.1 . This section states:

   MIME [4] is often used as a reference for base 64 encoding.  However,
   MIME does not define "base 64" per se, but rather a "base 64 Content-
   Transfer-Encoding" for use within MIME.  As such, MIME enforces a
   limit on line length of base 64-encoded data to 76 characters.  MIME
   inherits the encoding from Privacy Enhanced Mail (PEM) [3], stating
   that it is "virtually identical"; however, PEM uses a line length of
   64 characters.  The MIME and PEM limits are both due to limits within
   SMTP.

   Implementations MUST NOT add line feeds to base-encoded data unless
   the specification referring to this document explicitly directs base
   encoders to add line feeds after a specific number of characters.


In my case, I have a separate implementation that has not added line feeds to the Base64 (non-MIME) and as a result, Base64.decode64 can not decode the non-MIME encoded data. I believe this also indicates Base64#encode64 has the wrong behavior of MIME encoding Base64.

I have an example of the issue at https://github.com/adacosta/base64_compatible/blob/master/test/test_coding.rb#LC25 .
=end


-- 
http://bugs.ruby-lang.org/