At the moment serializing an NG entity can be a very expensive operation. We need to refactor the process so that we serialize just the actual data and not the whole field object tree.

Comments

plach’s picture

Issue tags: +Entity Field API

Tagging for the rocketship.

berdir’s picture

Issue tags: +Performance

More tagging for them rocketships.

yched’s picture

Might it be that what we want is "serialized data === data that gets cached in the entity (load) cache" ?

plach’s picture

Sounds sensible :)

berdir’s picture

Status: Active » Needs review
StatusFileSize
new839 bytes

Not sure about the caching, because that's about precalculating things that we don't want in a lot of cases.

$node = node_load(2);
debug($node->body->value);
$serialized = serialize($node);
debug(strlen($serialized));
$node = unserialize($serialized);
debug($node->body->value);

Result: 5800 instead of 13000

berdir’s picture

Title: Optimize EntityNG serialization » Optimize content entity serialization

Updating title.

amateescu’s picture

Issue summary: View changes
Status: Needs review » Reviewed & tested by the community

This makes very much sense :)

amateescu’s picture

Status: Reviewed & tested by the community » Needs review
StatusFileSize
new876 bytes
new436 bytes

Looking a bit further into this, it seems that we can also get rid of $fieldDefinitions, thus further reducing the length of the serialized object to 2281.

yched’s picture

The __wakeup() method has a "@todo This should be done before serializing the entity, but we would need to provide the full list of data to be serialized. See the dedicated issue at [this very issue]". Does it mean we could get rid of the @todo here ?

amateescu’s picture

StatusFileSize
new1.17 KB
new830 bytes

Something like this perhaps?

Status: Needs review » Needs work

The last submitted patch, 8: sleep-2027795-8.patch, failed testing.

amateescu’s picture

Failures are correct, this breaks autocomplete becasue the field only contains a 'target_id' => 0 property, while the 'entity' property is stripped. The only reason for which #5 passed is because this test was added 4 days later, on Oct 10th :)

fago’s picture

+++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
@@ -332,14 +332,27 @@ protected function clearTranslationCache() {
+  function __sleep() {

Should be public function.

Also, should we use the serializable interface or sleep? Not sure, what's better here or we've guidelines?

Failures are correct, this breaks autocomplete becasue the field only contains a 'target_id' => 0 property, while the 'entity' property is stripped.

hm, could we make ER-fields include the entity in getValue() when it is about to autosave? I think in this case it makes sense as it's obviously needed to restore the value of the field, i.e. $v = getValue() + setValue($v) should be idempotent. Given that, serialization should work also.

berdir’s picture

We need to use _sleep(), \Serializable behaves differently in 5.4 and 5.3 and is apparently broken in 5.4. We already had to switch Field classes because tests blew up in weird ways.

Otherwise agreed on adding the to-be-saved entity to the ER field item values.

berdir’s picture

Status: Needs work » Needs review
berdir’s picture

StatusFileSize
new2.01 KB
new1.23 KB

Forgot to make the method public.

yched’s picture

+1 on the fix, but for future ref it would be nicer if the comment included the reasoning (serialization,, idempotency of setValue(getValue()) in the specific case when 'entity' contains a fresh entity / 'target_id' =0)

berdir’s picture

16: sleep-2027795-15.patch queued for re-testing.

amateescu’s picture

NW for #17. Also, we usually put {@inheritdoc} for implementations of magic methods.

berdir’s picture

StatusFileSize
new2.04 KB
new1.26 KB

Ok, spent quite a bit of time discussing that comment in IRC, @yched and @timplunkett liked this version :) And according to git log -S"idempotent" is @amateescu a big fan of that word too and will be ok with it I guess ;)

yched’s picture

Status: Needs review » Reviewed & tested by the community

Looks good! Thanks!

(looks like we can't finish issues derailed on comment nitpicks while you're away :-/)

amateescu’s picture

@Berdir, not sure where you get your git logs from but I never used that word :D

tim.plunkett’s picture

Ah, I actually think it's Wim who added some of those, @amateescu was just credited on the #post_render_cache issue :)

wim leers’s picture

Hah :)

xjm’s picture

20: sleep-2027795-20.patch queued for re-testing.

catch’s picture

Status: Reviewed & tested by the community » Needs review

Could we profile this? I can see that it means less storage space so makes sense anyway, but is the __sleep() definitely going to be quicker than the extra serializing?

berdir’s picture

Status: Needs review » Reviewed & tested by the community

Was wondering about that too, so I had some fun benchmarking this :)

The question is what to benchmark exactly, I went with a node object that has all field item objects initalized, to see the full gain of this.

// node 1 is an article with an image, some terms and body text.
$node = node_load(1);

// Make sure all field item classes are initialized.
$node->getProperties();

$start = microtime(TRUE);
for ($i = 0; $i < 10000; $i++) {
  // Clone the node so that the field value extraction needs to run again.
  $serialized = serialize(clone $node);
}

$serialize_time = microtime(TRUE);

print "Serialized string length: " . strlen($serialized) . "\n";
print "Serialize: " . round($serialize_time - $start, 3) . " seconds \n";

for ($i = 0; $i < 10000; $i++) {
  unserialize($serialized);
}

$end = microtime(TRUE);

print "Unserialize: " . round($end - $serialize_time, 3) . " seconds \n";
 
print "Total time: " . round($end - $start, 3) . " seconds\n";

HEAD:

Serialized string length: 26444
Serialize: 6.905 seconds 
Unserialize: 6.664 seconds 
Total time: 13.569 seconds

With patch:

Serialized string length: 3131
Serialize: 4.416 seconds 
Unserialize: 0.347 seconds 
Total time: 4.764 seconds

I wasn't sure if the serialization itself would be faster too but apparently it is. And the real gain is unserialize, because for that we just have to initialize the main object, everything else would then be lazy-instantiated again (which is then going to be slower but usually only a small subset of the object is needed, if at all).

Note that the cloning actually makes up a large part of the seralization, but without it, the comparison is unrealistic and unfair, this is how it would look like:

HEAD:

Serialized string length: 26461
Serialize: 2.257 seconds 
Serialize: 6.182 seconds 
Total time: 8.439 seconds

Patch:

Serialized string length: 3131
Serialize: 0.287 seconds 
Serialize: 0.293 seconds 
Total time: 0.58 seconds

That's because after the first run, __sleep() is pretty much a no-op and doesn't have anything to do at all.

Enough numbers? :)

catch’s picture

Status: Reviewed & tested by the community » Fixed

Thanks that's great!

Committed/pushed to 8.x

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.