From: vmakarov@... Date: 2016-12-04T05:37:23+00:00 Subject: [ruby-core:78484] [Ruby trunk Bug#13002] Hash calculations no longer using universal hashing Issue #13002 has been updated by Vladimir Makarov. Nobuyoshi Nakada wrote: > > That means the hash function must be stronger when `strong_p`, doesn't it? > `Integer#hash` calls `rb_any_hash`, not `rb_any_hash_weak`, so its result should be strong but it isn't now. Sorry, if I understand Ruby documentation wrongly. But it (http://ruby-doc.org/core-2.3.3/Object.html#method-i-hash) says "The hash value for an object may not be identical across invocations or implementations of Ruby". I understand it as the same value is not guaranteed. But the values can be the same. Also I did not find in the documentation that the hash is strong one. But if I understand it wrongly or/and the same behaviour is really desirable because some applications assume this, than your patch has sense. Probably the MRI documentation should be changed too as you are adding the tests checking the randomness (it means you are expecting this behaviour). > > ```diff > diff --git c/hash.c w/hash.c > index b4c74ed..a5f660e 100644 > --- c/hash.c > +++ w/hash.c > @@ -147,24 +147,19 @@ long rb_objid_hash(st_index_t index); > > long > -rb_dbl_long_hash(double d) > +rb_dbl_long_hash(double d, int strong_p) > { > + union {double d; st_index_t i;} u; > + st_index_t hnum; > /* normalize -0.0 to 0.0 */ > if (d == 0.0) d = 0.0; > -#if SIZEOF_INT == SIZEOF_VOIDP > - return rb_memhash(&d, sizeof(d)); > -#else > - { > - union {double d; uint64_t i;} u; > - > - u.d = d; > - return rb_objid_hash(u.i); > - } > -#endif > + u.d = d; > + hnum = strong_p ? rb_hash_start(u.i) : u.i; > + return rb_objid_hash(hnum); > } > I don't like that the same code is now used for 32-bit and 64-bit targets. For ILP32 targets, st_index is 32-bit and basically half of double is thrown away. This decreases hash function quality considerably. Depending on target endianess, the result can be pretty bad in most widely used cases. So I would still use rb_memhash for ILP32 and rb_hash_start(rb_memhash(...)) for strong_p and ILP32. ---------------------------------------- Bug #13002: Hash calculations no longer using universal hashing https://bugs.ruby-lang.org/issues/13002#change-61862 * Author: Martin D��rst * Status: Open * Priority: Normal * Assignee: * ruby -v: * Backport: 2.1: DONTNEED, 2.2: DONTNEED, 2.3: DONTNEED ---------------------------------------- When preparing for my lecture on hash tables last week, I found that Ruby trunk doesn't do universal hashing anymore. See http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf for background. I contacted security@ruby-lang.org, but was told by Shugo that because trunk is not a published version, we can talk about it publicly. Shugo also said that the change was introduced in r56650. Following is some output from two different versions of Ruby that show the problem: On Ruby 2.2.3, different hash value for the same number every time Ruby is restarted: C:\Users\duerst>ruby -v ruby 2.2.3p173 (2015-08-18 revision 51636) [i386-mingw32] C:\Users\duerst>ruby -e 'puts 12345678.hash' 611647260 C:\Users\duerst>ruby -e 'puts 12345678.hash' -844752827 C:\Users\duerst>ruby -e 'puts 12345678.hash' 387106497 On Ruby trunk, always the same value: duerst@Arnisee /cygdrive/c/Data/ruby $ ruby -v ruby 2.4.0dev (2016-12-02 trunk 56965) [x86_64-cygwin] duerst@Arnisee /cygdrive/c/Data/ruby $ ruby -e 'puts 12345678.hash' 1846311797112760547 duerst@Arnisee /cygdrive/c/Data/ruby $ ruby -e 'puts 12345678.hash' 1846311797112760547 duerst@Arnisee /cygdrive/c/Data/ruby $ ruby -e 'puts 12345678.hash' 1846311797112760547 -- https://bugs.ruby-lang.org/ Unsubscribe: