blob: 537e03d3ac2acf96cbb14ee875e3a2811d37e386 [file] [log] [blame] [view]
Nigel Tao187a4792023-09-28 22:30:441# What’s Up With DCHECKs
2
3This is a transcript of [What's Up With
4That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
5Episode 2, a 2022 video discussion between [Sharon ([email protected])
6and Peter ([email protected])](https://www.youtube.com/watch?v=MpwbWSEDfjM).
7
8The transcript was automatically generated by speech-to-text software. It may
9contain minor errors.
10
11---
12
13You've seen DCHECKs around and been asked to use them in code review, but what
14are they? What's the difference between a CHECK and a DCHECK? How do you use
15them? Here to answer that is special guest is Peter, who works on UI and
16improving crash reports.
17
18Notes:
19- https://docs.google.com/document/d/146LoJ1E3N3E6fb4zDh92HPQc6yhRpNI7DSKlJjaYlLw/edit
20
21Links:
22- [What's Up With Pointers](https://www.youtube.com/watch?v=MpwbWSEDfjM)
23
24---
25
2600:00 SHARON: Hello, and welcome to What's Up With That?, the series that
27demystifies all things Chrome. I'm your host, Sharon. And today, we're talking
28about DCHECKs. You've seen them around. You've probably been told to add one in
29code review before. But what are they? What are they for, and what do they do?
30Our guest today is Peter, who works on desktop UI and Core UI. He's also
31working on improving Chrome's crash reports, which includes DCHECKs. Today
32he'll help us answer, what's up with DCHECKs? Welcome, Peter.
33
3400:30 PETER: Thanks for having me.
35
3600:32 SHARON: Yeah. Thanks for being here. So the most obvious question to
37start with, what is a DCHECK?
38
3900:39 PETER: So a CHECK and a DCHECK are both sort of things that make sure
40that what you think is true is true. Right? So this should never be called with
41an empty vector. You might add a CHECK for it, or you might add a DCHECK for
42it. And it's sort of similar to a search, which you may have hit during earlier
43programming outside of Chrome. And what it means is when this line gets hit, we
44check and see if it's true. And if it's not true, we crash. DCHECKs differ from
45CHECKs in that they are traditionally only in debug builds, or local
46development builds, or on our try-bots. So they have zero overhead when Chrome
47hits stable, because the CHECK just won't be there.
48
4901:24 SHARON: OK. So like if the D stands for Debug. That make sense.
50
5101:28 PETER: Yeah. I want debug to turn into developer, because now we have
52them by default if you're no longer - if you're doing a release build, and
53you're not turning them off, and you're not doing an official build, you get
54them.
55
5601:42 SHARON: OK. Well, you heard it here first, or maybe you heard it before.
57I heard it here first. So you mentioned asserts. So something that I've seen a
58couple times in Chrome, and also is part of the standard library, is
59`static_assert`. So how is that similar or different to DCHECKs? And why do we
60use or not use them?
61
6202:00 PETER: Right. So `static_assert`s are - and you're going to have to ask
63C++ experts, who can probably take some of the sharp edges off of this - but
64it's basically, if you can assert something in compile time, then you can use a
65`static_assert`, which means that you don't have to hit a code path where it's
66wrong. It sort of has to always hold true. And whenever you can use a
67`static_assert`, use a `static_assert`, because it's free. And basically, you
68can't compile the program if it's not true.
69
7002:31 SHARON: OK. That's good to know, because I definitely thought that was
71one of the C++ standard library things we should avoid, because we have a
72similar thing in Chromium. But I guess that's not the case.
73
7402:41 PETER: Yeah. Assert is the one that is - OK, so this is a little
75complicated, right? `static_assert` is a language feature, not a library
76feature. And someone will tell me that I'm wrong about something about this.
77Asserts are just sort of a poorer version of DCHECKs. So they won't go through
78our crash handling. It won't print the pretty stacks, et cetera.
79`static_assert`s, on the other hand, are a compile time feature. And we don't,
80as far as I know, have our own wrapper around it. We just use `static_assert`.
81So what you would maybe use this for is like if you have a constant - like, say
82you have an array, and the code makes an assumption that some constant is the
83size of this array, you can assert that in compile time, and that would be a
84good use of a `static_assert`.
85
8603:26 SHARON: OK. Cool. So you mentioned that some things have changed with how
87DCHECKs work. So can you give us a brief overview of the history of DCHECKs -
88what they used to be, people who have been using them for a while, how might
89they have changed from the idea of what they have as a DCHECK in their mind?
90
9103:43 PETER: Sure. So this is as best I know. I'm just sort of extrapolating
92from what I've seen. And what I think originally was true is that a CHECK used
93to be this logging statement, where you essentially compile the file name and
94the line number. And if this ever hits, then we'll log some stuff and then
95crash. Right? Which comes with a little bit of overhead, especially on size,
96that you basically take the file name and line number for every instance, and
97that generates a bunch of strings and numbers that essentially add to Chrome's
98binary size. I don't know how many steps between that and where we currently
99are. But right now, our CHECKs are just, if condition is false, crash, which
100means that you won't, out of the CHECK, get file name and line number. We'll
101get those out of debugging symbols. And you also won't get any of the logging
102messages that you can add to the end of a CHECK, which means that your debug
103info will be poorer, but it will be cheaper to use. So they've gotten from
104being pretty heavy CHECKs to being really cheap.
105
10605:01 SHARON: OK. So that kind of leads us into the question that I think most
107people want to have answered, which is, when should I use a DCHECK? When should
108I use a CHECK? When should I use neither?
109
11005:13 PETER: I would say that historically, we've said CHECKs are expensive.
111Don't use them unless you sort of have to. And I don't think that holds true
112anymore. So basically, unless you are in really performance-critical code, then
113use a CHECK. If there's anything that you care about where the program state
114will be unpredictable from this point on if it's not true, CHECK it. It's not
115that expensive. Right? We have a lot of code where we push a string onto a
116vector, and that never gets flagged in code review. And it's probably like 10
117times more expensive, if not 100 times more expensive, than adding a CHECK. The
118exception to that is if you're in a really hot loop where you don't want to
119dereference a pointer, then a CHECK might add some cost. And the other is if
120the condition that you're trying to validate is really expensive. It's not the
121CHECK itself that's expensive. It's the thing you're evaluating. And if that's
122expensive, then you might not afford doing a CHECK. If you don't know that it's
123expensive, it's probably not expensive.
124
12506:20 SHARON: Can you give us an example of something expensive to evaluate for
126a CHECK?
127
12806:24 PETER: Right. So say that you have something in video code that for every
129video frame, for every pixel validates the alpha value as opaque, or something.
130That would probably make video conferencing a little bit worse performance.
131Another thing would just be if you have to traverse a graph on every frame, and
132it will sort of jump all over memory to see if some reachability problem in
133your graph is true, that's going to be a lot more expensive. But CHECKing that
134index is less than some vector bounds, I think that should fall under cheap.
135And -
136
13707:02 SHARON: OK.
138
13907:02 PETER: culturally, we've tried to avoid doing a lot of these. And I think
140it's just hurting us.
141
14207:09 SHARON: OK. So since most places we should use CHECKs, are there any
143places where a DCHECK would be better then? Or any time you would have normally
144previously used a DCHECK, you should just make that a check?
145
14607:23 PETER: So we have a new construct that's called `EXPENSIVE_DCHECK`s, or
147if `EXPENSIVE_DCHECK`s are on, I think we should add a corresponding macro for
148`EXPENSIVE_DCHECK`. And then you should be able to just say, either it's
149expensive and has to be a DCHECK, so use `EXPENSIVE_DCHECK`; otherwise, use
150CHECK. And my hunch would be like 95% of what we have as DCHECKs would probably
151serve us better as CHECKs. But your code owner and reviewer might disagree with
152that. And it's not yet documented policy that we say CHECKs are cheap; just add
153a billion of them. But I would like to get there eventually.
154
15508:04 SHARON: OK. So if you put in a CHECK, and your reviewer tells them this
156should be a DCHECK, the person writing the CL can point them to this video, and
157then they can discuss from there.
158
15908:13 PETER: I mean, yeah, you can either say Peter disagrees with you, or I
160can get further along this and say we make policy that CHECKs are cheap, so
161they are preferable. So a lot of foot-shooters with DCHECKs is that you expect
162this property to hold true, but you never effectively CHECK it. And that can
163lead to all sorts of bad stuff, right? Like if you're trying to DCHECK that
164some origin for some frame makes some assumptions of site iso - I don't know
165site isolation well enough to say this. But basically, if you're DCHECKing that
166the code that you're running runs under some sort of permissions, then that is
167effectively unchecked in stable, right? And we do care about those properties,
168and it would be really good if we crashed rather than leaked information
169between sites.
170
17109:12 SHARON: Right.
172
17309:14 PETER: Yeah.
174
17509:16 SHARON: So that seems like a good tie-in for the fact that within some
176security people, they don't have the most positive impression of DCHECKs, shall
177we say? So a couple examples of this, for listeners who maybe aren't familiar
178with this, is one person previously on security saying DCHECKs are pronounced
179as "code that's not tested". Someone else I told about this episode - I said,
180we're going to talk about DCHECKs - they immediately said, is it going to be
181about why DCHECKs are bad? So amongst the Chrome security folks, they are not a
182huge fan of DCHECKs. Can you tell us maybe why that is?
183
18409:51 PETER: So if we go back a little bit in time, it used to be that DCHECKs
185were only built for developers if they do a debug build. And Chrome has gotten
186so big that you don't want to do a debug build or the UI is incredibly slow.
187Unfortunately, it's sort of not that great an experience to work in a debug
188build. So people work in a release build. That doesn't mean that they don't
189care about the things they put under DCHECK. It just means they want to go on
190with their lives and not wait x minutes for the browser to launch, or however
191bad it is nowadays. And that means that they, unfortunately, lose coverage for
192the DCHECKs. So this means that if your code is not exercised well under tests,
193then this is completely not enforced. But it's slightly better than a comment,
194in that you're really expecting this thing to hold true, and that's clearly an
195expectation. But how good is the expectation if you don't look at it? So last
196year, I believe, we made it so that DCHECKs are on by default if you're not
197doing an official build. And this included release builds. So now, it's like at
198least if you're doing development and you hit this condition, it's going to
199explode, which is really good, because then you can find a lot of issues, and
200we can prevent a lot of issues from ever happening in the first place. It is
201really hard for you, as a developer, to make the assumption that if this
202invariant is ever false, I will find it during development, and it will never
203happen in the wild. And DCHECKs are essentially either, I will find this
204locally before I submit it, or all bets are off; or it is I don't care that
205much if this thing doesn't hold true, which is sort of a weird assertion to
206make. So I think we're in this little awkward in-between state. And this
207in-between state, remember, mostly exists as a performance optimization from
208when CHECKs used to be a lot more expensive, in terms of code size. So did I
209cover most of this?
210
21112:06 SHARON: Yeah. I think, based on that, I think it's pretty easy to see why
212people who are more concerned about security are not a fan of this.
213
21412:13 PETER: I mean, if you care about it, especially if it causes privacy or
215security or user-harm sort of things, just CHECK. Just CHECK, right? If it
216makes your code animate a thing slightly weirder, like it will just jump to the
217end position instead of going through your fence load, whatever. Maybe you can
218make that a DCHECK. Maybe it doesn't matter. Like it's wrong, but it's not that
219bad. But most of the cases, you DCHECK something, where it's like the program
220is going to be in some indeterminate state, and we actually care about if it's
221ever false. So maybe we can afford to make it a CHECK. Maybe we should look
222more about our sort of vector pushbacks than we should look at our CHECKs, and
223then just have more CHECKs. More CHECKs. Because it's also like when things
224break, it's a lot cheaper to debug a DCHECK than your program is in some
225indeterminate state, because it was allowed to pass through a DCHECK that you
226thought was - and when you read the code, unless you're used to reading it as
227DCHECKs - oh, that just didn't get enforced - it's sort of hard to try to
228figure out why the thing was doing the wrong thing in the first place.
229
23013:22 SHARON: OK. How is this as a summary? When in doubt, CHECK it out.
231
23213:27 PETER: I like that. I like that. And you might get pushback by reviewers,
233who aren't on my side of the fence yet. And then you can decide on which hill
234you want to die on, at least until we've made policy to just not complain about
235DCHECKs, or not complain about CHECKs.
236
23713:45 SHARON: All right. That sounds good. So you mentioned stuff failing in
238the wild. And for people who might not know, do you want to just briefly
239explain what failing in the wild means?
240
24113:54 PETER: OK. So there's two things. Just failing in the wild just means
242that when this thing rolls out to Canary, Dev, Beta, Stable, if you have a
243CHECK that will crash and generate a crash report as if you had a memory bug,
244but it crashes in a deterministic way, at a deterministic spot - so you can
245find out exactly what assumption was violated. Say that this should never be
246called with a null pointer. Then you can say, look at this line where it
247crashed. It clearly got hit with a null pointer. And then you can try to figure
248out, from the stack, why that happened, rather than after you post this pointer
249to a task, it crashes somewhere completely irrelevant from the actual call
250site. Well, so in the wild specifically means it generates a crash report so
251you can look at it, or in the wild means it crashes at a user computer rather
252than - in the wildness outside of development. And as for the other part of in
253the wild, it's that we have started running non-crashy DCHECKs for a percentage
254of Windows Canary. And we're looking to expand that. And we're gathering
255information, basically, about which assertions or invariants that we have are
256violated in practice in the wild, even though we don't think that they should
257be. And that will sort of also culturally move the needle so that we do care
258about DCHECKs. And when we care about DCHECKs, sort of similarly to how we care
259about CHECKs, is it really that important to make the big distinction between
260the two? Except for the case where you have really expensive DCHECKs, they
261might still be worth keeping separate. And those will be things like, if you do
262things for - say that you zero out memory or something for every memory block
263that you allocate and free, or you do things for every audio sample, or for
264every video frame pixel, those sort of things. And then we can sort of keep
265expensive stuff gated out from CHECKs. And then maybe we don't need this
266in-between where people don't know whether they can trust a DCHECK or not.
267
26816:04 SHARON: So you mentioned that certain release builds now have DCHECKs
269enabled. So for those in the wild versus regular CHECKs in the wild, if those
270happen to fail, do the reports for those look the same? Are they in the same
271place? Can they be treated the same?
272
27316:20 PETER: Yeah. Well, they are uploaded to the same crash-reporting thing.
274They show up under a special branch. And you likely will get bugs filed to you
275if they hit very frequently, just like you would with crashes. There's a sort
276of slight difference, in that they say dump without crashing. And that's just
277sort of a rollout strategy for us. Because if we made DCHECK builds incredibly
278crashy, because they hit more than CHECKs, then we can never roll this thing
279out. Or it gets a lot scarier for us to put this on 5% of a new platform that
280we haven't tested. But as it is right now, the first DCHECK that gets hit for
281every process gets a crash dump uploaded.
282
28317:07 SHARON: OK. So I've been definitely told to use dump without crashing at
284certain points in CLs, where it's like, OK, we think that this shouldn't
285happen. But if it does, we don't necessarily want to crash the browser because
286of it. With the changes you've mentioned to DCHECKs happening, should those
287just be CHECKs instead now or should those still be dump without crashing?
288
28917:29 PETER: So if you want dump without crashing, and you made those a DCHECK,
290then you would only have coverage in the Canary channels that we are testing.
291Right? So if you want to get dump reports from the platforms that we're not
292currently testing, including all the way up to Stable, you probably still want
293to keep that a dump without crashing. You want to make sure that you're not
294using the sort of - you want to make sure that you triage these, because you
295don't want to keep these generating crash dumps n forever. You should still
296treat them as if they were crashes. And I think the same thing should hold true
297for DCHECKs. You should only add them for an invariant that you care about
298being violated, right? So as it is violated, you should either figure out why
299your invariant was wrong, or you should try to fix the breakage. And you can
300probably add more information to logging to figure out why that happened.
301
30218:41 SHARON: So when you have a CHECK, and it crashes in the wild, you get a
303stack trace. And that's what you have to work on to figure out what went wrong
304for debugging. Right? So what are some things that you can do, as a developer,
305to make these CHECKs a bit more useful for you - ways to incorporate other
306information that you can use to help yourself debug?
307
30819:01 PETER: So some of the stuff that we have is we have something called
309crash keys, which are essentially, you can write a piece of string data,
310essentially - there's probably some other data types - and if you write those
311before you're running dump without crashing, or before you hit a CHECK, or
312before you hit a DCHECK, then those will be uploaded along the crash dump. And
313if you talk to someone who knows where to find them, you can basically go in
314under a crash report, and then under field product data, or something like
315that, you should be able to find your key-value pair. And if you have
316information in there, you'll be able to look at it. The other thing that I like
317to do, which is probably the more obvious thing, is if you have somewhat of a
318hypothesis that this thing should only fail if a or b or c is not true, then
319you can add CHECKs for those. Like, if a CHECK is failing, you can add more
320CHECKs to see why the CHECK was failing. In general, you're not going to get as
321much out of a mini-dump that you want. You're not going to have the full heap
322available to you, because that would be a mega-dump. You can usually find
323whatever is on the stack if you go in with a debugger. And I know that you
324wanted to lead me into talking about CHECK\_GT and CHECK\_EQ, which are
325essentially, if you want to check that x is greater than y, then you should use
326CHECK\_GT(x,y). The problem with those, in this sort of context, is that,
327similarly to CHECKs - so CHECK\_GT gets compiled into, basically, if not x is
328greater than y, crash. So unfortunately, the values of x and y are optimized
329out when you're doing an official build.
330
33121:02 SHARON: So this makes me think of some stuff we mentioned in the last
332episode, which was with Dana. Check it out if you haven't. But one of the types
333we mentioned there was SafeRef, which enforces a certain condition. And if that
334fails - so in the case of a SafeRef, it ensures that the value you have there
335is not null. And if that's ever not true, then you do get a crash similar to if
336a CHECK fails. So in general, would you say it's better practice to enforce and
337make sure your assumptions are held in these other, more structural ways than
338relying on CHECKs instead?
339
34021:41 PETER: So let me see if I can get at what you actually want out of that
341one. So if we look at - there's a RawRef type, right? So what's good with the
342RawRef is that you have a type that annotates that this thing cannot possibly
343be null. So if you assign to it, and you're assigning a null pointer, your
344program is going to crash, and you don't need to think about whether you throw
345a null pointer in or not. If you keep passing a RawRef around, then that's
346essentially you passing around a non-null pointer. And therefore, you don't
347have to check that it's not null pointer in every step of the way. You only
348need to do it when you're - I mean, the type will do it for you, but it only
349needs to happen when you're converting from a pointer to a ref, essentially, or
350a RawRef. And what's so good about that is now you have the - previously, you
351might just CHECK that this isn't called with null pointer or whatever. But then
352you would do that for four or five arguments. And you'd be like, null pointer
353CHECKs are this part of the function body. And then it just gets super-noisy.
354But if you're using the RawRef types, then the semantics of the type will
355enforce that for you. And you don't have to think about that when reading the
356code, because usually when you read the code, you're going to be like, it's a
357pointer. Can it be null or not? What does it point to? And this thing will at
358least tell you, it can't be null. And you still have the question of, what does
359it point to? And that's fine. So I like enforcing this through types more than
360checking those assumptions, and then checking inside of what happens. If you
361were assigned to this RawRef, then it's going to crash in the constructor if
362you have a null pointer. And then based on that stack trace, if we have good
363stack data, you're going to know at what line you created the RawRef. And
364therefore, it's equivalent to checking for not null pointer, because you can
365trust the type to do the checking. And since I know Dana made this, I can
366probably with 200% certainty say that it's a CHECK and not a DCHECK. But we do
367have a couple of other places where you have a WeakPtr that shouldn't be
368dereferenced on the wrong sequence. And those are complicated words. And that,
369unfortunately, is a DCHECK. So we're hitting some sort of - I don't know if
370that CHECK is actually expensive, or if it should be a CHECK, or if it could be
371a CHECK. I think, especially, if you're in core types, the size overhead of
372adding a CHECK is negligible, because all of the users of it benefit from that
373CHECK. So unless it's incredibly -
374
37524:28 SHARON: What do you mean by core types?
376
37724:30 PETER: Say that you make a `scoped_refptr` something, that ref pointer is
378used everywhere. So if you CHECKed in the destructor, then you're validating
379all of the clients of your scope ref pointer. So for one CHECK, you get the
380price of a lot of CHECKing. Whereas if in your client code you're validating
381some parameters of an API call that only gets called once, then that's one
382CHECK you add for one case. But if you're re-use, then your CHECK gets a lot
383more value. And it's also easier to get parameters wrong sometimes if you have
384500 clients that are calling your API. You can't trust all of them to get it
385right. Whereas if you're just developing your feature, and it's only used by
386your feature, then you can be a little bit more certain with how it's being
387called. I would say, still add CHECKs, because code evolves over time. It's
388sort of like how you can add unit tests to make sure that no one breaks your
389code in the future. If you add CHECKs, then no one can break your code in the
390future.
391
39225:37 SHARON: Mm-hmm. OK. So you mentioned a few things about how CHECKs and
393DCHECKs are changing. [AUDIO OUT] what is currently in the works, and what is
394the long-term goal and plan for CHECKs and DCHECKs.
395
39625:53 PETER: So currently what's in the work is we've made sure that some
397libraries that we use, like Abseil and WebRTC, which is a first-party
398third-party library, that they both use Chrome's crashing report system, which
399means that you get more predictable crash stacks because it's using the
400immediate crash macro. But also, you get the fatal logging field that I talked
401about. That gets logged as part of crash dumps. So you hopefully have more
402glanceable, actionable crash reports whenever a CHECK is violated inside of
403Abseil, or in WebRTC, as it were. And then upcoming is we want to make sure
404that we keep an eye out for our DCHECKs on other platforms, such as Mac. I know
405that there's some issues with getting that fatal log field in the GPU process,
406and I'm working on fixing that as well. So hopefully, it just means more
407reports for the things you care about and easier to action on reports. That's
408what we're hoping.
409
41027:03 SHARON: If people think that this sounds really cool, want to have some
411more involvement, or want to ask more questions, what's a good place for them
412to do that?
413
41427:11 PETER: I like Slack as a thing for this. So the #cxx channel on Slack,
415the #base channel on Slack, the #halp channel on Slack is really good. #halp is
416really, I think, unintimidating. You can just throw whatever question you have
417in there, and I happen to be around there. If you can find out what my last
418name is through sheer force of will, you can send me an email to my Chromium
419username. What else would we have? I think if they want to get involved, just
420add CHECKs to your code. That's a really good way to do it. Just make sure that
421your code does what you expect it to in more cases.
422
42327:48 SHARON: Maybe if you have a CL, and you're just doing some drive-by
424cleanup, you can turn some DCHECKs into CHECKs also?
425
42627:56 PETER: If your reviewer is cool with that, I'm cool with that. Otherwise,
427you can just try to hope for us making that policy that we use CHECKs - if it's
428something we care about, we use a CHECK instead of a DCHECK, unless we have a
429really good reason to use a DCHECK. And that would be performance.
430
43128:15 SHARON: That sounds good. And one last question is, what do you want
432people to take away as their main takeaway from this discussion?
433
43428:26 PETER: I think validating code assumptions is really valuable. So you
435think that you're pretty smart when you're writing something, or you remember -
436I mean, you're sometimes kind of smart when you're writing something. And
437you're like, this can't possibly be wrong. And in practice, looking at crash
438reports, these things are wrong all the time. So please validate any
439assumptions that you make. It's also, I would say, better than a comment,
440because it's a comment that doesn't get outdated without you noticing it. So, I
441think, validate your assumptions to make sure that your code is more robust.
442And validate properties you care about. And don't be afraid to use CHECKs.
443
44429:13 SHARON: All right. That sounds like a good summary. Thank you very much
445for being here, Peter. It was great to learn about DCHECKs.
446
44729:18 PETER: Yeah. Thanks for having me.
448
44929:24 SHARON: Action. Hello.
450
45129:26 PETER: Oh. Take four.
452
45329:29 SHARON: [LAUGHS] Take four. And action.