Bug #21396
openSet#initialize should call Set#add on items passed in
Description
class Foo < Set
def add(item) = super(item.bytesize)
end
x = Foo.new(["foo"])
p x
p x.include?(3)
On Ruby 3.4 the output is this:
> ruby -v test.rb
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
#<Foo: {3}>
true
On Ruby master the output is this:
> make run
./miniruby -I./lib -I. -I.ext/common -r./arm64-darwin24-fake ./test.rb
#<Set: {"foo"}>
false
The bug is that initialize
is not calling add
for the elements passed in, so the subclass doesn't get a chance to change them.
I've sent a PR here: https://github.com/ruby/ruby/pull/13518
Updated by jeremyevans0 (Jeremy Evans) about 1 month ago
- Related to Bug #21375: Set[] does not call #initialize added
Updated by jeremyevans0 (Jeremy Evans) about 1 month ago
This is not a bug, IMO. Using underlying functions instead of calling methods was one of the deliberate design decisions for core Set (see #21216), and how other core collection classes work. Array.new(1, true)
does not call Array#[]=
, it calls rb_ary_store
.
If we want to do this for Set#initialize
and Set.[]
for backwards compatibility, we should be consistent and call methods instead of underlying functions for every case where the methods were called in stdlib set. That will make it slower, though.
FWIW, in the code path you are using in stdlib Set, Set#add
is not called directly, you are relying on Set#merge
calling it. So this is at least a request to have Set#initialize
call Set#merge
and to have Set#merge
call Set#add
for every element.
Updated by tenderlovemaking (Aaron Patterson) about 1 month ago
jeremyevans0 (Jeremy Evans) wrote in #note-2:
This is not a bug, IMO. Using underlying functions instead of calling methods was one of the deliberate design decisions for core Set (see #21216), and how other core collection classes work.
Array.new(1, true)
does not callArray#[]=
, it callsrb_ary_store
.If we want to do this for
Set#initialize
andSet.[]
for backwards compatibility, we should be consistent and call methods instead of underlying functions for every case where the methods were called in stdlib set. That will make it slower, though.FWIW, in the code path you are using in stdlib Set,
Set#add
is not called directly, you are relying onSet#merge
calling it. So this is at least a request to haveSet#initialize
callSet#merge
and to haveSet#merge
callSet#add
for every element.
I don't have an opinion, really. But this change in behavior is breaking our existing code, and I suspect we're not the only ones.
Updated by ko1 (Koichi Sasada) about 1 month ago
How about to redfine initialize
on subclass of Set
to call #add
?
Updated by Eregon (Benoit Daloze) about 1 month ago
Is there any public code in some gem or so that relies on this? (the example is rather synthetic)
Updated by knu (Akinori MUSHA) about 1 month ago
I've always created custom variants of Set too, and I don't think it's rare to find these in in-house codebases.
Updated by tenderlovemaking (Aaron Patterson) about 1 month ago
ko1 (Koichi Sasada) wrote in #note-4:
How about to redfine
initialize
on subclass ofSet
to call#add
?
I think we could, but that means people have to change their code when upgrading.
Eregon (Benoit Daloze) wrote in #note-5:
Is there any public code in some gem or so that relies on this? (the example is rather synthetic)
It's kind of hard to search for this on GitHub, but I was able to find two similar-ish examples:
This one would not raise an exception in the same way, and this one could potentially be missing entries from its @hash_lookup
instance variable.
Maybe we could publish a gem that just has a copy of the current set.rb
file, but with Set
renamed to RbSet
or something. Then if people run in to issues they can use the gem and change the superclass to RbSet
. I think it would fix our case at work and probably the cases referenced above. I agree with @jeremyevans0's points, but I'm sure this is going to cause friction for people upgrading and it would be nice if we can make it as easy as possible to upgrade.
Updated by knu (Akinori MUSHA) 7 days ago
Considering the feedback we've received about compatibility in the new experimental Set implementation, it may be in our best interest to revert to the pure-Ruby version.
If improving performance and reducing memory footprint remain crucial, one option would be to keep only the underlying data structure and a few core methods from set.c, while leaving the rest written in pure Ruby.
I also considered introducing a separate base class (RbSet, Set::Ruby, Set::Base, or whatever) for compatibility, but that approach feels neither elegant nor the Ruby way, and it risks becoming permanent technical debt. Since 3.5 hasn't been finalized yet, I'm leaning toward making Set itself subclass-friendly again.
Updated by jeremyevans0 (Jeremy Evans) 7 days ago
knu (Akinori MUSHA) wrote in #note-8:
Considering the feedback we've received about compatibility in the new experimental Set implementation, it may be in our best interest to revert to the pure-Ruby version.
If improving performance and reducing memory footprint remain crucial, one option would be to keep only the underlying data structure and a few core methods from set.c, while leaving the rest written in pure Ruby.
I don't think reverting to the pure Ruby version is in our best interests. I do understand the backwards compatibility issues, though to be honest, all issues raised so far are cases where users were relying on implementation details (of the form method a
calls method b
). I think the benefits of making Set similar to Array and Hash exceed the costs of the backwards compatibility issues. It's not just performance and reduced memory usage, but also thread safety.
Consider add?
def add?(o)
add(o) unless include?(o)
end
Keeping backwards compatibility (add?
calls include?
and possibly add
) here requires giving up thread safety. With the pure Ruby implementation, add?
can return a value that was already added to the set, if another thread concurrently added it to the set between the include?
and add
calls. This type of thread safety issue can perhaps be accepted from a standard library, but should not be accepted for a core class. Since Ruby 3.2, Set has been closer to core class than standard library, due to the autoload.
As you mentioned, restoring backwards compatibility is possible while keeping the same optimized data structure. It's mostly a question of how far we want to go. If we want to support backwards compatibility for Set subclasses and method overrides, I think we should assess each situation on a case-by-case basis. I can implement the changes to have Set#initialize
call Set#add
(this issue) and have Set.[]
call Set#initialize
(#21375), if that is the behavior you prefer. We can officially support such method overriding by adding tests for the behavior, so users who override the methods are not relying on implementation details.
Updated by knu (Akinori MUSHA) 7 days ago
Jeremy, thanks for the reply.
Your point about thread-safety is well taken. It is an important advantage. As a possible compromise, we could keep the C backing and switch the behavior of methods when the class is subclassed in exchange for losing thread-safety. It's just an idea and not the cleanest solution, though.
The way Set was written, the accompanying tests, and the existence of OrderedSet all make it clear that this behaviour was a deliberate design choice, so after those long-standing signals, I feel like it's hard to tell users that those were implementation details they shouldn't have relied on.
Updated by jeremyevans0 (Jeremy Evans) 6 days ago
knu (Akinori MUSHA) wrote in #note-10:
Jeremy, thanks for the reply.
Your point about thread-safety is well taken. It is an important advantage. As a possible compromise, we could keep the C backing and switch the behavior of methods when the class is subclassed in exchange for losing thread-safety. It's just an idea and not the cleanest solution, though.
I think that is a reasonable compromise. It keeps the advantages of core Set when not subclassed, and keeps backwards compatibility for subclasses (at least, subclasses that don't access @hash
). Please let me know if you would like me to work on that.
The way Set was written, the accompanying tests, and the existence of OrderedSet all make it clear that this behaviour was a deliberate design choice, so after those long-standing signals, I feel like it's hard to tell users that those were implementation details they shouldn't have relied on.
Thank you for providing that context. Since there were not explicit tests that methods should call other methods, my assumption was that the behavior was an implementation detail and not by design. If we consider these to be deliberate design decisions and not implementation details, I recommend we add tests for them. I can handle that if you would like.
Updated by Eregon (Benoit Daloze) 6 days ago
Regarding thread-safety, for add?
the only thing is the return value where the old implementation might potentially return the Set even though the element was added concurrently.
It's still thread-safe in that it doesn't corrupt the Set or anything like that, and the given element is always added to the Set when the method returns.
Since there were not explicit tests that methods should call other methods
There were a few ruby/spec specs which you disabled in https://github.com/ruby/ruby/pull/13074/files#diff-00365d65577dd7b3e357e99b895bb8e5a26d699c33a1afabc1a48cd91a8c5914
Those specs or at least part of those specs were added in https://github.com/ruby/spec/pull/629 to ensure proper interop with Set-like classes, as tested with SetSpecs::SetLike and used e.g. in the persistent-dmnd gem.
https://bugs.ruby-lang.org/issues/15240 is the related issue to make this kind of interop better defined and less hacky.
Probably interop with Set-like classes not inheriting from Set is too much of a ask, but it might be good to re-enable these specs and change SetSpecs::SetLike
to inherit from Set
.