Skip to content

MsgPackReader: properly increment index in ext #370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 15, 2021

Conversation

harpocrates
Copy link
Contributor

@harpocrates harpocrates commented Nov 7, 2021

The parseExt helper in MsgPackReader was not properly updating the
position in the data source.

Added two regression tests. These compare serialized and re-serialized
values (instead of just initial and serialized) since structurally identical Ext
messages won't compare identical (due to the Array[Byte] field whose
equality is determined by identity).

Fixes #369

The `parseExt` helper in `MsgPackReader` was not properly updating the
position in the data source. Also override `Ext.equals` to compare the
data arrays by contents instead of identity (which is consistent with
the other subclasses of `Msg`).

Added two regression tests. Note that these don't work without the
`equals` changes, since structurally identical `Ext` messages won't
compare equal unless they also happen to be the same object.

Fixes com-lihaoyi#369
}

override def hashCode: Int =
MurmurHash3.bytesHash(b, "Ext".hashCode)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag should be represented in the hash somehow. We could probably just use it as the seed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about this, the less comfortable I am about updating equals and hashCode here. That could lead to some end-user changes and it isn't consistent with some of the other types (eg. Bytes). Maybe this is still desirable change, but probably not in a PR about fixing ext parsing.

I'm going to switch the round-tripping test I added to assert equivalence of the serialized and serialized-deserialized-reserialized payloads (that way we don't need to call equals on a Msg)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for smaller, well-scoped PRs.

Re: equals/hashCode, it's hard to imagine that any users are depending on the behavior being broken other than to work around it.

Good point about case class Binary(value: Array[Byte]) extends Msg being affected, too. A fix for one should probably fix both.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rabbit hole goes deeper. Given that 0.0d == 0L, should Float64(0.0) == Int64(0L)? Probably so! This is especially relevant since round-tripping Int64(0L) through messagepack will come back as Int32(0). I think you're wise to push equals/hashCode to a separate PR. Making assertions against the Array[Byte] sounds good for now.

Copy link
Contributor Author

@harpocrates harpocrates Nov 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that 0.0d == 0L, should Float64(0.0) == Int64(0L)?

Personally, I don't think this should be the case - the "int format family" and "float format family" are separate. I further wish that 0.0d == 0L would fail to typecheck instead of returning true (that's another rabbit hole though, and Scala is unlikely to ever change that).

This is especially relevant since round-tripping Int64(0L) through messagepack will come back as Int32(0).

I agree Int64(0L) should roundtrip, but on the other hand: is there really a need for the distinction? Why not just a single case class Int(value: Long) extends Msg (and UInt for unsigned)? We don't distinguish Int16 or the fixint cases, so why distinguish the 32 and 64 bit cases?

@@ -98,7 +98,9 @@ abstract class BaseMsgPackReader extends upickle.core.BufferingByteParser{
}
def parseExt[T](n: Int, visitor: Visitor[_, T]) = {
val (arr, i, j) = sliceArr(index + 1, n)
visitor.visitExt(getByteSafe(index), arr, i, j, index)
val res = visitor.visitExt(getByteSafe(index), arr, i, j, index)
index += n + 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks important!

If I'm reading correctly, looks like n is the array length, and the + 1 is the tag/ext-type byte. Seems like this fix should cover the fixext types as well. 👍

Copy link
Collaborator

@htmldoug htmldoug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After hash fix, lgtm. thanks!

This rolls back changing `equals` and `hashCode`. Those changes may be
worth it, but not in this PR. The test compares serialized and
re-serialized values
@harpocrates
Copy link
Contributor Author

I've switched back to Array[Byte] for the tests.

If you are planning on squashing and merging, maybe use that as the commit message instead of the default concatenation of intermediate messages?

@htmldoug htmldoug merged commit 106a706 into com-lihaoyi:master Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

upack doesn't roundtrip Ext nested inside another Msg
2 participants