From: "Martin J. Dürst" Date: 2016-12-05T19:22:22+09:00 Subject: [ruby-core:78493] Re: [Ruby trunk Bug#13002] Hash calculations no longer using universal hashing Hello Victor, others, On 2016/12/04 15:28, vmakarov@redhat.com wrote: > Issue #13002 has been updated by Vladimir Makarov. > Nobuyoshi Nakada wrote: >> `test_wrapper_of_special_const` failed, that is, it's impossible to emulate switching weak and strong versions under the hood. >> I think this behavior is quirky. > > You are right, the behavior with strong/weak hashes is wrong for Ruby. This test is not guaranteed to work. After reading the documentation thoroughly, I got a conclusion that hash value used for the hash tables should be *always* the same. So the hash tables can not use two different hash functions at all. Over the weekend, I started to think about this issue some more and to write an email. First, I should apologize for not having remembered the strong/weak hash proposal, because I had read it. But I came up with a very simple example where I had doubts. After Nobu's mail, I saw that this is very similar to the example in Bug #9381. > Although strong/weak hash approach gave about 5-6% improvement out of 45% on Ruby hash table benchmarks, I think we should not use it for Ruby. I will provide a patch to get rid of it in 2 days. I think it may still be somewhat too early to completely give up on the strong/weak distinction. I really like using CPU cycles only when necessary, and that's exactly what this proposal is about. I think we have to distinguish three different cases: 1) Calls to #hash inside a class (as e.g. in Bug #9381): This always has to use strong (in the sense of universal, not necessarily in the sense of cryptographically strong) hashing. 2) Calls to #hash for general objects/classes from inside st.c: These have to always use strong hashing, because the hash is defined as a method on the class, and can't be switched between weak and strong. 3) Calculations of hash for special objects such as String, Symbol, Integer,... from directly inside st.c: In this case, I think it would be possible to use the weak/strong distinction. I also think that the majority, probably even a big majority, of hash keys are Strings, Symbols, and Integers, so that would keep a big part of the savings. I haven't studied the code in detail, but I could imagine that special-casing for String, Symbol, and (the old Fixnum part of) Integer e.g. in do_hash could do the job. Something along the lines of the following pseudo-code: /* The reserved hash value and its substitution. */ #define RESERVED_HASH_VAL (~(st_hash_t) 0) #define RESERVED_HASH_SUBSTITUTION_VAL ((st_hash_t) 0) /* Return hash value of KEY for table TAB. */ static inline st_hash_t do_hash(st_data_t key, st_table *tab) { st_index_t h; switch (key.class) { case String: case Symbol: case Fixnum: # use weak or strong for basic types st_index_t h = (st_index_t)(tab->curr_hash)(key); break; default: # always use strong for all other types st_index_t h = (st_index_t)(strong_hash)(key); } #### aside: Don't see why we need conditional compilation for the #### following 5 lines. #if SIZEOF_INT == SIZEOF_VOIDP st_hash_t hash = h; #else st_hash_t hash = h; #endif /* RESERVED_HASH_VAL is used for a deleted entry. Map it into another value. Such mapping should be extremely rare. */ return hash == RESERVED_HASH_VAL ? RESERVED_HASH_SUBSTITUTION_VAL : hash; } I understand that we don't want to 'pollute' st.c with Ruby-specific stuff, so this is just pseudocode. It adds a switch, but we can maybe balance that by reducing the switch in any_hash_general in hash.c. Even if not, the switch may still be faster than a complicated hash calculation on a string or symbol, which will loop per character. Hope this helps, Martin. Unsubscribe: