Project

General

Profile

Actions

Bug #21396

open

Set#initialize should call Set#add on items passed in

Added by tenderlovemaking (Aaron Patterson) about 1 month ago. Updated 6 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:122411]

Description

class Foo < Set
  def add(item) = super(item.bytesize)
end

x = Foo.new(["foo"])
p x
p x.include?(3)

On Ruby 3.4 the output is this:

> ruby -v test.rb
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin24]
#<Foo: {3}>
true

On Ruby master the output is this:

> make run
./miniruby -I./lib -I. -I.ext/common  -r./arm64-darwin24-fake  ./test.rb 
#<Set: {"foo"}>
false

The bug is that initialize is not calling add for the elements passed in, so the subclass doesn't get a chance to change them.

I've sent a PR here: https://github.com/ruby/ruby/pull/13518


Related issues 1 (1 open0 closed)

Related to Ruby - Bug #21375: Set[] does not call #initializeOpenActions
Actions #1

Updated by jeremyevans0 (Jeremy Evans) about 1 month ago

  • Related to Bug #21375: Set[] does not call #initialize added

Updated by jeremyevans0 (Jeremy Evans) about 1 month ago

This is not a bug, IMO. Using underlying functions instead of calling methods was one of the deliberate design decisions for core Set (see #21216), and how other core collection classes work. Array.new(1, true) does not call Array#[]=, it calls rb_ary_store.

If we want to do this for Set#initialize and Set.[] for backwards compatibility, we should be consistent and call methods instead of underlying functions for every case where the methods were called in stdlib set. That will make it slower, though.

FWIW, in the code path you are using in stdlib Set, Set#add is not called directly, you are relying on Set#merge calling it. So this is at least a request to have Set#initialize call Set#merge and to have Set#merge call Set#add for every element.

Updated by tenderlovemaking (Aaron Patterson) about 1 month ago

jeremyevans0 (Jeremy Evans) wrote in #note-2:

This is not a bug, IMO. Using underlying functions instead of calling methods was one of the deliberate design decisions for core Set (see #21216), and how other core collection classes work. Array.new(1, true) does not call Array#[]=, it calls rb_ary_store.

If we want to do this for Set#initialize and Set.[] for backwards compatibility, we should be consistent and call methods instead of underlying functions for every case where the methods were called in stdlib set. That will make it slower, though.

FWIW, in the code path you are using in stdlib Set, Set#add is not called directly, you are relying on Set#merge calling it. So this is at least a request to have Set#initialize call Set#merge and to have Set#merge call Set#add for every element.

I don't have an opinion, really. But this change in behavior is breaking our existing code, and I suspect we're not the only ones.

Updated by ko1 (Koichi Sasada) about 1 month ago

How about to redfine initialize on subclass of Set to call #add?

Updated by Eregon (Benoit Daloze) about 1 month ago

Is there any public code in some gem or so that relies on this? (the example is rather synthetic)

Updated by knu (Akinori MUSHA) about 1 month ago

I've always created custom variants of Set too, and I don't think it's rare to find these in in-house codebases.

Updated by tenderlovemaking (Aaron Patterson) about 1 month ago

ko1 (Koichi Sasada) wrote in #note-4:

How about to redfine initialize on subclass of Set to call #add?

I think we could, but that means people have to change their code when upgrading.

Eregon (Benoit Daloze) wrote in #note-5:

Is there any public code in some gem or so that relies on this? (the example is rather synthetic)

It's kind of hard to search for this on GitHub, but I was able to find two similar-ish examples:

This one would not raise an exception in the same way, and this one could potentially be missing entries from its @hash_lookup instance variable.

Maybe we could publish a gem that just has a copy of the current set.rb file, but with Set renamed to RbSet or something. Then if people run in to issues they can use the gem and change the superclass to RbSet. I think it would fix our case at work and probably the cases referenced above. I agree with @jeremyevans0's points, but I'm sure this is going to cause friction for people upgrading and it would be nice if we can make it as easy as possible to upgrade.

Updated by knu (Akinori MUSHA) 7 days ago

Considering the feedback we've received about compatibility in the new experimental Set implementation, it may be in our best interest to revert to the pure-Ruby version.

If improving performance and reducing memory footprint remain crucial, one option would be to keep only the underlying data structure and a few core methods from set.c, while leaving the rest written in pure Ruby.

I also considered introducing a separate base class (RbSet, Set::Ruby, Set::Base, or whatever) for compatibility, but that approach feels neither elegant nor the Ruby way, and it risks becoming permanent technical debt. Since 3.5 hasn't been finalized yet, I'm leaning toward making Set itself subclass-friendly again.

Updated by jeremyevans0 (Jeremy Evans) 7 days ago

knu (Akinori MUSHA) wrote in #note-8:

Considering the feedback we've received about compatibility in the new experimental Set implementation, it may be in our best interest to revert to the pure-Ruby version.

If improving performance and reducing memory footprint remain crucial, one option would be to keep only the underlying data structure and a few core methods from set.c, while leaving the rest written in pure Ruby.

I don't think reverting to the pure Ruby version is in our best interests. I do understand the backwards compatibility issues, though to be honest, all issues raised so far are cases where users were relying on implementation details (of the form method a calls method b). I think the benefits of making Set similar to Array and Hash exceed the costs of the backwards compatibility issues. It's not just performance and reduced memory usage, but also thread safety.

Consider add?

  def add?(o)
    add(o) unless include?(o)
  end

Keeping backwards compatibility (add? calls include? and possibly add) here requires giving up thread safety. With the pure Ruby implementation, add? can return a value that was already added to the set, if another thread concurrently added it to the set between the include? and add calls. This type of thread safety issue can perhaps be accepted from a standard library, but should not be accepted for a core class. Since Ruby 3.2, Set has been closer to core class than standard library, due to the autoload.

As you mentioned, restoring backwards compatibility is possible while keeping the same optimized data structure. It's mostly a question of how far we want to go. If we want to support backwards compatibility for Set subclasses and method overrides, I think we should assess each situation on a case-by-case basis. I can implement the changes to have Set#initialize call Set#add (this issue) and have Set.[] call Set#initialize (#21375), if that is the behavior you prefer. We can officially support such method overriding by adding tests for the behavior, so users who override the methods are not relying on implementation details.

Updated by knu (Akinori MUSHA) 7 days ago

Jeremy, thanks for the reply.

Your point about thread-safety is well taken. It is an important advantage. As a possible compromise, we could keep the C backing and switch the behavior of methods when the class is subclassed in exchange for losing thread-safety. It's just an idea and not the cleanest solution, though.

The way Set was written, the accompanying tests, and the existence of OrderedSet all make it clear that this behaviour was a deliberate design choice, so after those long-standing signals, I feel like it's hard to tell users that those were implementation details they shouldn't have relied on.

Updated by jeremyevans0 (Jeremy Evans) 6 days ago

knu (Akinori MUSHA) wrote in #note-10:

Jeremy, thanks for the reply.

Your point about thread-safety is well taken. It is an important advantage. As a possible compromise, we could keep the C backing and switch the behavior of methods when the class is subclassed in exchange for losing thread-safety. It's just an idea and not the cleanest solution, though.

I think that is a reasonable compromise. It keeps the advantages of core Set when not subclassed, and keeps backwards compatibility for subclasses (at least, subclasses that don't access @hash). Please let me know if you would like me to work on that.

The way Set was written, the accompanying tests, and the existence of OrderedSet all make it clear that this behaviour was a deliberate design choice, so after those long-standing signals, I feel like it's hard to tell users that those were implementation details they shouldn't have relied on.

Thank you for providing that context. Since there were not explicit tests that methods should call other methods, my assumption was that the behavior was an implementation detail and not by design. If we consider these to be deliberate design decisions and not implementation details, I recommend we add tests for them. I can handle that if you would like.

Updated by Eregon (Benoit Daloze) 6 days ago

Regarding thread-safety, for add? the only thing is the return value where the old implementation might potentially return the Set even though the element was added concurrently.
It's still thread-safe in that it doesn't corrupt the Set or anything like that, and the given element is always added to the Set when the method returns.

Since there were not explicit tests that methods should call other methods

There were a few ruby/spec specs which you disabled in https://github.com/ruby/ruby/pull/13074/files#diff-00365d65577dd7b3e357e99b895bb8e5a26d699c33a1afabc1a48cd91a8c5914
Those specs or at least part of those specs were added in https://github.com/ruby/spec/pull/629 to ensure proper interop with Set-like classes, as tested with SetSpecs::SetLike and used e.g. in the persistent-dmnd gem.
https://bugs.ruby-lang.org/issues/15240 is the related issue to make this kind of interop better defined and less hacky.
Probably interop with Set-like classes not inheriting from Set is too much of a ask, but it might be good to re-enable these specs and change SetSpecs::SetLike to inherit from Set.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0