Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 1 | # What’s Up With DCHECKs |
| 2 | |
| 3 | This is a transcript of [What's Up With |
| 4 | That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq) |
| 5 | Episode 2, a 2022 video discussion between [Sharon ([email protected]) |
| 6 | and Peter ([email protected])](https://www.youtube.com/watch?v=MpwbWSEDfjM). |
| 7 | |
| 8 | The transcript was automatically generated by speech-to-text software. It may |
| 9 | contain minor errors. |
| 10 | |
| 11 | --- |
| 12 | |
| 13 | You've seen DCHECKs around and been asked to use them in code review, but what |
| 14 | are they? What's the difference between a CHECK and a DCHECK? How do you use |
| 15 | them? Here to answer that is special guest is Peter, who works on UI and |
| 16 | improving crash reports. |
| 17 | |
| 18 | Notes: |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 19 | |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 20 | - https://docs.google.com/document/d/146LoJ1E3N3E6fb4zDh92HPQc6yhRpNI7DSKlJjaYlLw/edit |
| 21 | |
| 22 | Links: |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 23 | |
| 24 | - [What's Up With Pointers] |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 25 | |
| 26 | --- |
| 27 | |
| 28 | 00:00 SHARON: Hello, and welcome to What's Up With That?, the series that |
| 29 | demystifies all things Chrome. I'm your host, Sharon. And today, we're talking |
| 30 | about DCHECKs. You've seen them around. You've probably been told to add one in |
| 31 | code review before. But what are they? What are they for, and what do they do? |
| 32 | Our guest today is Peter, who works on desktop UI and Core UI. He's also |
| 33 | working on improving Chrome's crash reports, which includes DCHECKs. Today |
| 34 | he'll help us answer, what's up with DCHECKs? Welcome, Peter. |
| 35 | |
| 36 | 00:30 PETER: Thanks for having me. |
| 37 | |
| 38 | 00:32 SHARON: Yeah. Thanks for being here. So the most obvious question to |
| 39 | start with, what is a DCHECK? |
| 40 | |
| 41 | 00:39 PETER: So a CHECK and a DCHECK are both sort of things that make sure |
| 42 | that what you think is true is true. Right? So this should never be called with |
| 43 | an empty vector. You might add a CHECK for it, or you might add a DCHECK for |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 44 | it. And it's sort of similar to asserts, which you may have hit during earlier |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 45 | programming outside of Chrome. And what it means is when this line gets hit, we |
| 46 | check and see if it's true. And if it's not true, we crash. DCHECKs differ from |
| 47 | CHECKs in that they are traditionally only in debug builds, or local |
| 48 | development builds, or on our try-bots. So they have zero overhead when Chrome |
| 49 | hits stable, because the CHECK just won't be there. |
| 50 | |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 51 | 01:24 SHARON: OK. So I guess the D stands for Debug. That make sense. |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 52 | |
| 53 | 01:28 PETER: Yeah. I want debug to turn into developer, because now we have |
| 54 | them by default if you're no longer - if you're doing a release build, and |
| 55 | you're not turning them off, and you're not doing an official build, you get |
| 56 | them. |
| 57 | |
| 58 | 01:42 SHARON: OK. Well, you heard it here first, or maybe you heard it before. |
| 59 | I heard it here first. So you mentioned asserts. So something that I've seen a |
| 60 | couple times in Chrome, and also is part of the standard library, is |
| 61 | `static_assert`. So how is that similar or different to DCHECKs? And why do we |
| 62 | use or not use them? |
| 63 | |
| 64 | 02:00 PETER: Right. So `static_assert`s are - and you're going to have to ask |
| 65 | C++ experts, who can probably take some of the sharp edges off of this - but |
| 66 | it's basically, if you can assert something in compile time, then you can use a |
| 67 | `static_assert`, which means that you don't have to hit a code path where it's |
| 68 | wrong. It sort of has to always hold true. And whenever you can use a |
| 69 | `static_assert`, use a `static_assert`, because it's free. And basically, you |
| 70 | can't compile the program if it's not true. |
| 71 | |
| 72 | 02:31 SHARON: OK. That's good to know, because I definitely thought that was |
| 73 | one of the C++ standard library things we should avoid, because we have a |
| 74 | similar thing in Chromium. But I guess that's not the case. |
| 75 | |
| 76 | 02:41 PETER: Yeah. Assert is the one that is - OK, so this is a little |
| 77 | complicated, right? `static_assert` is a language feature, not a library |
| 78 | feature. And someone will tell me that I'm wrong about something about this. |
| 79 | Asserts are just sort of a poorer version of DCHECKs. So they won't go through |
| 80 | our crash handling. It won't print the pretty stacks, et cetera. |
| 81 | `static_assert`s, on the other hand, are a compile time feature. And we don't, |
| 82 | as far as I know, have our own wrapper around it. We just use `static_assert`. |
| 83 | So what you would maybe use this for is like if you have a constant - like, say |
| 84 | you have an array, and the code makes an assumption that some constant is the |
| 85 | size of this array, you can assert that in compile time, and that would be a |
| 86 | good use of a `static_assert`. |
| 87 | |
| 88 | 03:26 SHARON: OK. Cool. So you mentioned that some things have changed with how |
| 89 | DCHECKs work. So can you give us a brief overview of the history of DCHECKs - |
| 90 | what they used to be, people who have been using them for a while, how might |
| 91 | they have changed from the idea of what they have as a DCHECK in their mind? |
| 92 | |
| 93 | 03:43 PETER: Sure. So this is as best I know. I'm just sort of extrapolating |
| 94 | from what I've seen. And what I think originally was true is that a CHECK used |
| 95 | to be this logging statement, where you essentially compile the file name and |
| 96 | the line number. And if this ever hits, then we'll log some stuff and then |
| 97 | crash. Right? Which comes with a little bit of overhead, especially on size, |
| 98 | that you basically take the file name and line number for every instance, and |
| 99 | that generates a bunch of strings and numbers that essentially add to Chrome's |
| 100 | binary size. I don't know how many steps between that and where we currently |
| 101 | are. But right now, our CHECKs are just, if condition is false, crash, which |
| 102 | means that you won't, out of the CHECK, get file name and line number. We'll |
| 103 | get those out of debugging symbols. And you also won't get any of the logging |
| 104 | messages that you can add to the end of a CHECK, which means that your debug |
| 105 | info will be poorer, but it will be cheaper to use. So they've gotten from |
| 106 | being pretty heavy CHECKs to being really cheap. |
| 107 | |
| 108 | 05:01 SHARON: OK. So that kind of leads us into the question that I think most |
| 109 | people want to have answered, which is, when should I use a DCHECK? When should |
| 110 | I use a CHECK? When should I use neither? |
| 111 | |
| 112 | 05:13 PETER: I would say that historically, we've said CHECKs are expensive. |
| 113 | Don't use them unless you sort of have to. And I don't think that holds true |
| 114 | anymore. So basically, unless you are in really performance-critical code, then |
| 115 | use a CHECK. If there's anything that you care about where the program state |
| 116 | will be unpredictable from this point on if it's not true, CHECK it. It's not |
| 117 | that expensive. Right? We have a lot of code where we push a string onto a |
| 118 | vector, and that never gets flagged in code review. And it's probably like 10 |
| 119 | times more expensive, if not 100 times more expensive, than adding a CHECK. The |
| 120 | exception to that is if you're in a really hot loop where you don't want to |
| 121 | dereference a pointer, then a CHECK might add some cost. And the other is if |
| 122 | the condition that you're trying to validate is really expensive. It's not the |
| 123 | CHECK itself that's expensive. It's the thing you're evaluating. And if that's |
| 124 | expensive, then you might not afford doing a CHECK. If you don't know that it's |
| 125 | expensive, it's probably not expensive. |
| 126 | |
| 127 | 06:20 SHARON: Can you give us an example of something expensive to evaluate for |
| 128 | a CHECK? |
| 129 | |
| 130 | 06:24 PETER: Right. So say that you have something in video code that for every |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 131 | video frame, for every pixel validates the alpha value is opaque, or something. |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 132 | That would probably make video conferencing a little bit worse performance. |
| 133 | Another thing would just be if you have to traverse a graph on every frame, and |
| 134 | it will sort of jump all over memory to see if some reachability problem in |
| 135 | your graph is true, that's going to be a lot more expensive. But CHECKing that |
| 136 | index is less than some vector bounds, I think that should fall under cheap. |
| 137 | And - |
| 138 | |
| 139 | 07:02 SHARON: OK. |
| 140 | |
| 141 | 07:02 PETER: culturally, we've tried to avoid doing a lot of these. And I think |
| 142 | it's just hurting us. |
| 143 | |
| 144 | 07:09 SHARON: OK. So since most places we should use CHECKs, are there any |
| 145 | places where a DCHECK would be better then? Or any time you would have normally |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 146 | previously used a DCHECK, you should just make that a CHECK? |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 147 | |
| 148 | 07:23 PETER: So we have a new construct that's called `EXPENSIVE_DCHECK`s, or |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 149 | if `EXPENSIVE_DCHECKS_ARE_ON`, I think we should add a corresponding macro for |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 150 | `EXPENSIVE_DCHECK`. And then you should be able to just say, either it's |
| 151 | expensive and has to be a DCHECK, so use `EXPENSIVE_DCHECK`; otherwise, use |
| 152 | CHECK. And my hunch would be like 95% of what we have as DCHECKs would probably |
| 153 | serve us better as CHECKs. But your code owner and reviewer might disagree with |
| 154 | that. And it's not yet documented policy that we say CHECKs are cheap; just add |
| 155 | a billion of them. But I would like to get there eventually. |
| 156 | |
| 157 | 08:04 SHARON: OK. So if you put in a CHECK, and your reviewer tells them this |
| 158 | should be a DCHECK, the person writing the CL can point them to this video, and |
| 159 | then they can discuss from there. |
| 160 | |
| 161 | 08:13 PETER: I mean, yeah, you can either say Peter disagrees with you, or I |
| 162 | can get further along this and say we make policy that CHECKs are cheap, so |
| 163 | they are preferable. So a lot of foot-shooters with DCHECKs is that you expect |
| 164 | this property to hold true, but you never effectively CHECK it. And that can |
| 165 | lead to all sorts of bad stuff, right? Like if you're trying to DCHECK that |
| 166 | some origin for some frame makes some assumptions of site iso - I don't know |
| 167 | site isolation well enough to say this. But basically, if you're DCHECKing that |
| 168 | the code that you're running runs under some sort of permissions, then that is |
| 169 | effectively unchecked in stable, right? And we do care about those properties, |
| 170 | and it would be really good if we crashed rather than leaked information |
| 171 | between sites. |
| 172 | |
| 173 | 09:12 SHARON: Right. |
| 174 | |
| 175 | 09:14 PETER: Yeah. |
| 176 | |
| 177 | 09:16 SHARON: So that seems like a good tie-in for the fact that within some |
| 178 | security people, they don't have the most positive impression of DCHECKs, shall |
| 179 | we say? So a couple examples of this, for listeners who maybe aren't familiar |
| 180 | with this, is one person previously on security saying DCHECKs are pronounced |
| 181 | as "code that's not tested". Someone else I told about this episode - I said, |
| 182 | we're going to talk about DCHECKs - they immediately said, is it going to be |
| 183 | about why DCHECKs are bad? So amongst the Chrome security folks, they are not a |
| 184 | huge fan of DCHECKs. Can you tell us maybe why that is? |
| 185 | |
| 186 | 09:51 PETER: So if we go back a little bit in time, it used to be that DCHECKs |
| 187 | were only built for developers if they do a debug build. And Chrome has gotten |
| 188 | so big that you don't want to do a debug build or the UI is incredibly slow. |
| 189 | Unfortunately, it's sort of not that great an experience to work in a debug |
| 190 | build. So people work in a release build. That doesn't mean that they don't |
| 191 | care about the things they put under DCHECK. It just means they want to go on |
| 192 | with their lives and not wait x minutes for the browser to launch, or however |
| 193 | bad it is nowadays. And that means that they, unfortunately, lose coverage for |
| 194 | the DCHECKs. So this means that if your code is not exercised well under tests, |
| 195 | then this is completely not enforced. But it's slightly better than a comment, |
| 196 | in that you're really expecting this thing to hold true, and that's clearly an |
| 197 | expectation. But how good is the expectation if you don't look at it? So last |
| 198 | year, I believe, we made it so that DCHECKs are on by default if you're not |
| 199 | doing an official build. And this included release builds. So now, it's like at |
| 200 | least if you're doing development and you hit this condition, it's going to |
| 201 | explode, which is really good, because then you can find a lot of issues, and |
| 202 | we can prevent a lot of issues from ever happening in the first place. It is |
| 203 | really hard for you, as a developer, to make the assumption that if this |
| 204 | invariant is ever false, I will find it during development, and it will never |
| 205 | happen in the wild. And DCHECKs are essentially either, I will find this |
| 206 | locally before I submit it, or all bets are off; or it is I don't care that |
| 207 | much if this thing doesn't hold true, which is sort of a weird assertion to |
| 208 | make. So I think we're in this little awkward in-between state. And this |
| 209 | in-between state, remember, mostly exists as a performance optimization from |
| 210 | when CHECKs used to be a lot more expensive, in terms of code size. So did I |
| 211 | cover most of this? |
| 212 | |
| 213 | 12:06 SHARON: Yeah. I think, based on that, I think it's pretty easy to see why |
| 214 | people who are more concerned about security are not a fan of this. |
| 215 | |
| 216 | 12:13 PETER: I mean, if you care about it, especially if it causes privacy or |
| 217 | security or user-harm sort of things, just CHECK. Just CHECK, right? If it |
| 218 | makes your code animate a thing slightly weirder, like it will just jump to the |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 219 | end position instead of going through your fancy little... whatever. Maybe you can |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 220 | make that a DCHECK. Maybe it doesn't matter. Like it's wrong, but it's not that |
| 221 | bad. But most of the cases, you DCHECK something, where it's like the program |
| 222 | is going to be in some indeterminate state, and we actually care about if it's |
| 223 | ever false. So maybe we can afford to make it a CHECK. Maybe we should look |
| 224 | more about our sort of vector pushbacks than we should look at our CHECKs, and |
| 225 | then just have more CHECKs. More CHECKs. Because it's also like when things |
| 226 | break, it's a lot cheaper to debug a DCHECK than your program is in some |
| 227 | indeterminate state, because it was allowed to pass through a DCHECK that you |
| 228 | thought was - and when you read the code, unless you're used to reading it as |
| 229 | DCHECKs - oh, that just didn't get enforced - it's sort of hard to try to |
| 230 | figure out why the thing was doing the wrong thing in the first place. |
| 231 | |
| 232 | 13:22 SHARON: OK. How is this as a summary? When in doubt, CHECK it out. |
| 233 | |
| 234 | 13:27 PETER: I like that. I like that. And you might get pushback by reviewers, |
| 235 | who aren't on my side of the fence yet. And then you can decide on which hill |
| 236 | you want to die on, at least until we've made policy to just not complain about |
| 237 | DCHECKs, or not complain about CHECKs. |
| 238 | |
| 239 | 13:45 SHARON: All right. That sounds good. So you mentioned stuff failing in |
| 240 | the wild. And for people who might not know, do you want to just briefly |
| 241 | explain what failing in the wild means? |
| 242 | |
| 243 | 13:54 PETER: OK. So there's two things. Just failing in the wild just means |
| 244 | that when this thing rolls out to Canary, Dev, Beta, Stable, if you have a |
| 245 | CHECK that will crash and generate a crash report as if you had a memory bug, |
| 246 | but it crashes in a deterministic way, at a deterministic spot - so you can |
| 247 | find out exactly what assumption was violated. Say that this should never be |
| 248 | called with a null pointer. Then you can say, look at this line where it |
| 249 | crashed. It clearly got hit with a null pointer. And then you can try to figure |
| 250 | out, from the stack, why that happened, rather than after you post this pointer |
| 251 | to a task, it crashes somewhere completely irrelevant from the actual call |
| 252 | site. Well, so in the wild specifically means it generates a crash report so |
| 253 | you can look at it, or in the wild means it crashes at a user computer rather |
| 254 | than - in the wildness outside of development. And as for the other part of in |
| 255 | the wild, it's that we have started running non-crashy DCHECKs for a percentage |
| 256 | of Windows Canary. And we're looking to expand that. And we're gathering |
| 257 | information, basically, about which assertions or invariants that we have are |
| 258 | violated in practice in the wild, even though we don't think that they should |
| 259 | be. And that will sort of also culturally move the needle so that we do care |
| 260 | about DCHECKs. And when we care about DCHECKs, sort of similarly to how we care |
| 261 | about CHECKs, is it really that important to make the big distinction between |
| 262 | the two? Except for the case where you have really expensive DCHECKs, they |
| 263 | might still be worth keeping separate. And those will be things like, if you do |
| 264 | things for - say that you zero out memory or something for every memory block |
| 265 | that you allocate and free, or you do things for every audio sample, or for |
| 266 | every video frame pixel, those sort of things. And then we can sort of keep |
| 267 | expensive stuff gated out from CHECKs. And then maybe we don't need this |
| 268 | in-between where people don't know whether they can trust a DCHECK or not. |
| 269 | |
| 270 | 16:04 SHARON: So you mentioned that certain release builds now have DCHECKs |
| 271 | enabled. So for those in the wild versus regular CHECKs in the wild, if those |
| 272 | happen to fail, do the reports for those look the same? Are they in the same |
| 273 | place? Can they be treated the same? |
| 274 | |
| 275 | 16:20 PETER: Yeah. Well, they are uploaded to the same crash-reporting thing. |
| 276 | They show up under a special branch. And you likely will get bugs filed to you |
| 277 | if they hit very frequently, just like you would with crashes. There's a sort |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 278 | of slight difference, in that they say DumpWithoutCrashing. And that's just |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 279 | sort of a rollout strategy for us. Because if we made DCHECK builds incredibly |
| 280 | crashy, because they hit more than CHECKs, then we can never roll this thing |
| 281 | out. Or it gets a lot scarier for us to put this on 5% of a new platform that |
| 282 | we haven't tested. But as it is right now, the first DCHECK that gets hit for |
| 283 | every process gets a crash dump uploaded. |
| 284 | |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 285 | 17:07 SHARON: OK. So I've been definitely told to use DumpWithoutCrashing at |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 286 | certain points in CLs, where it's like, OK, we think that this shouldn't |
| 287 | happen. But if it does, we don't necessarily want to crash the browser because |
| 288 | of it. With the changes you've mentioned to DCHECKs happening, should those |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 289 | just be CHECKs instead now or should those still be DumpWithoutCrashing? |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 290 | |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 291 | 17:29 PETER: So if you want DumpWithoutCrashing, and you made those a DCHECK, |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 292 | then you would only have coverage in the Canary channels that we are testing. |
| 293 | Right? So if you want to get dump reports from the platforms that we're not |
| 294 | currently testing, including all the way up to Stable, you probably still want |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 295 | to keep that a DumpWithoutCrashing. You want to make sure that you're not |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 296 | using the sort of - you want to make sure that you triage these, because you |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 297 | don't want to keep these generating crash dumps forever. You should still |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 298 | treat them as if they were crashes. And I think the same thing should hold true |
| 299 | for DCHECKs. You should only add them for an invariant that you care about |
| 300 | being violated, right? So as it is violated, you should either figure out why |
| 301 | your invariant was wrong, or you should try to fix the breakage. And you can |
| 302 | probably add more information to logging to figure out why that happened. |
| 303 | |
| 304 | 18:41 SHARON: So when you have a CHECK, and it crashes in the wild, you get a |
| 305 | stack trace. And that's what you have to work on to figure out what went wrong |
| 306 | for debugging. Right? So what are some things that you can do, as a developer, |
| 307 | to make these CHECKs a bit more useful for you - ways to incorporate other |
| 308 | information that you can use to help yourself debug? |
| 309 | |
| 310 | 19:01 PETER: So some of the stuff that we have is we have something called |
| 311 | crash keys, which are essentially, you can write a piece of string data, |
| 312 | essentially - there's probably some other data types - and if you write those |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 313 | before you're running DumpWithoutCrashing, or before you hit a CHECK, or |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 314 | before you hit a DCHECK, then those will be uploaded along the crash dump. And |
| 315 | if you talk to someone who knows where to find them, you can basically go in |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 316 | under a crash report, and then under Fields, Product data, or something like |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 317 | that, you should be able to find your key-value pair. And if you have |
| 318 | information in there, you'll be able to look at it. The other thing that I like |
| 319 | to do, which is probably the more obvious thing, is if you have somewhat of a |
| 320 | hypothesis that this thing should only fail if a or b or c is not true, then |
| 321 | you can add CHECKs for those. Like, if a CHECK is failing, you can add more |
| 322 | CHECKs to see why the CHECK was failing. In general, you're not going to get as |
| 323 | much out of a mini-dump that you want. You're not going to have the full heap |
| 324 | available to you, because that would be a mega-dump. You can usually find |
| 325 | whatever is on the stack if you go in with a debugger. And I know that you |
| 326 | wanted to lead me into talking about CHECK\_GT and CHECK\_EQ, which are |
| 327 | essentially, if you want to check that x is greater than y, then you should use |
| 328 | CHECK\_GT(x,y). The problem with those, in this sort of context, is that, |
| 329 | similarly to CHECKs - so CHECK\_GT gets compiled into, basically, if not x is |
| 330 | greater than y, crash. So unfortunately, the values of x and y are optimized |
| 331 | out when you're doing an official build. |
| 332 | |
| 333 | 21:02 SHARON: So this makes me think of some stuff we mentioned in the last |
| 334 | episode, which was with Dana. Check it out if you haven't. But one of the types |
| 335 | we mentioned there was SafeRef, which enforces a certain condition. And if that |
| 336 | fails - so in the case of a SafeRef, it ensures that the value you have there |
| 337 | is not null. And if that's ever not true, then you do get a crash similar to if |
| 338 | a CHECK fails. So in general, would you say it's better practice to enforce and |
| 339 | make sure your assumptions are held in these other, more structural ways than |
| 340 | relying on CHECKs instead? |
| 341 | |
| 342 | 21:41 PETER: So let me see if I can get at what you actually want out of that |
| 343 | one. So if we look at - there's a RawRef type, right? So what's good with the |
| 344 | RawRef is that you have a type that annotates that this thing cannot possibly |
| 345 | be null. So if you assign to it, and you're assigning a null pointer, your |
| 346 | program is going to crash, and you don't need to think about whether you throw |
| 347 | a null pointer in or not. If you keep passing a RawRef around, then that's |
| 348 | essentially you passing around a non-null pointer. And therefore, you don't |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 349 | have to check that it's not `nullptr` in every step of the way. You only |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 350 | need to do it when you're - I mean, the type will do it for you, but it only |
| 351 | needs to happen when you're converting from a pointer to a ref, essentially, or |
| 352 | a RawRef. And what's so good about that is now you have the - previously, you |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 353 | might just CHECK that this isn't called with `nullptr` or whatever. But then |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 354 | you would do that for four or five arguments. And you'd be like, null pointer |
| 355 | CHECKs are this part of the function body. And then it just gets super-noisy. |
| 356 | But if you're using the RawRef types, then the semantics of the type will |
| 357 | enforce that for you. And you don't have to think about that when reading the |
| 358 | code, because usually when you read the code, you're going to be like, it's a |
| 359 | pointer. Can it be null or not? What does it point to? And this thing will at |
| 360 | least tell you, it can't be null. And you still have the question of, what does |
| 361 | it point to? And that's fine. So I like enforcing this through types more than |
| 362 | checking those assumptions, and then checking inside of what happens. If you |
| 363 | were assigned to this RawRef, then it's going to crash in the constructor if |
| 364 | you have a null pointer. And then based on that stack trace, if we have good |
| 365 | stack data, you're going to know at what line you created the RawRef. And |
| 366 | therefore, it's equivalent to checking for not null pointer, because you can |
| 367 | trust the type to do the checking. And since I know Dana made this, I can |
| 368 | probably with 200% certainty say that it's a CHECK and not a DCHECK. But we do |
| 369 | have a couple of other places where you have a WeakPtr that shouldn't be |
| 370 | dereferenced on the wrong sequence. And those are complicated words. And that, |
| 371 | unfortunately, is a DCHECK. So we're hitting some sort of - I don't know if |
| 372 | that CHECK is actually expensive, or if it should be a CHECK, or if it could be |
| 373 | a CHECK. I think, especially, if you're in core types, the size overhead of |
| 374 | adding a CHECK is negligible, because all of the users of it benefit from that |
| 375 | CHECK. So unless it's incredibly - |
| 376 | |
| 377 | 24:28 SHARON: What do you mean by core types? |
| 378 | |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 379 | 24:30 PETER: Say that you make a `scoped_refptr` or something, that ref pointer is |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 380 | used everywhere. So if you CHECKed in the destructor, then you're validating |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 381 | all of the clients of your `scoped_refptr`. So for one CHECK, you get the |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 382 | price of a lot of CHECKing. Whereas if in your client code you're validating |
| 383 | some parameters of an API call that only gets called once, then that's one |
| 384 | CHECK you add for one case. But if you're re-use, then your CHECK gets a lot |
| 385 | more value. And it's also easier to get parameters wrong sometimes if you have |
| 386 | 500 clients that are calling your API. You can't trust all of them to get it |
| 387 | right. Whereas if you're just developing your feature, and it's only used by |
| 388 | your feature, then you can be a little bit more certain with how it's being |
| 389 | called. I would say, still add CHECKs, because code evolves over time. It's |
| 390 | sort of like how you can add unit tests to make sure that no one breaks your |
| 391 | code in the future. If you add CHECKs, then no one can break your code in the |
| 392 | future. |
| 393 | |
| 394 | 25:37 SHARON: Mm-hmm. OK. So you mentioned a few things about how CHECKs and |
| 395 | DCHECKs are changing. [AUDIO OUT] what is currently in the works, and what is |
| 396 | the long-term goal and plan for CHECKs and DCHECKs. |
| 397 | |
| 398 | 25:53 PETER: So currently what's in the work is we've made sure that some |
| 399 | libraries that we use, like Abseil and WebRTC, which is a first-party |
| 400 | third-party library, that they both use Chrome's crashing report system, which |
| 401 | means that you get more predictable crash stacks because it's using the |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 402 | IMMEDIATE\_CRASH macro. But also, you get the fatal logging field that I talked |
Nigel Tao | 187a479 | 2023-09-28 22:30:44 | [diff] [blame] | 403 | about. That gets logged as part of crash dumps. So you hopefully have more |
| 404 | glanceable, actionable crash reports whenever a CHECK is violated inside of |
| 405 | Abseil, or in WebRTC, as it were. And then upcoming is we want to make sure |
| 406 | that we keep an eye out for our DCHECKs on other platforms, such as Mac. I know |
| 407 | that there's some issues with getting that fatal log field in the GPU process, |
| 408 | and I'm working on fixing that as well. So hopefully, it just means more |
| 409 | reports for the things you care about and easier to action on reports. That's |
| 410 | what we're hoping. |
| 411 | |
| 412 | 27:03 SHARON: If people think that this sounds really cool, want to have some |
| 413 | more involvement, or want to ask more questions, what's a good place for them |
| 414 | to do that? |
| 415 | |
| 416 | 27:11 PETER: I like Slack as a thing for this. So the #cxx channel on Slack, |
| 417 | the #base channel on Slack, the #halp channel on Slack is really good. #halp is |
| 418 | really, I think, unintimidating. You can just throw whatever question you have |
| 419 | in there, and I happen to be around there. If you can find out what my last |
| 420 | name is through sheer force of will, you can send me an email to my Chromium |
| 421 | username. What else would we have? I think if they want to get involved, just |
| 422 | add CHECKs to your code. That's a really good way to do it. Just make sure that |
| 423 | your code does what you expect it to in more cases. |
| 424 | |
| 425 | 27:48 SHARON: Maybe if you have a CL, and you're just doing some drive-by |
| 426 | cleanup, you can turn some DCHECKs into CHECKs also? |
| 427 | |
| 428 | 27:56 PETER: If your reviewer is cool with that, I'm cool with that. Otherwise, |
| 429 | you can just try to hope for us making that policy that we use CHECKs - if it's |
| 430 | something we care about, we use a CHECK instead of a DCHECK, unless we have a |
| 431 | really good reason to use a DCHECK. And that would be performance. |
| 432 | |
| 433 | 28:15 SHARON: That sounds good. And one last question is, what do you want |
| 434 | people to take away as their main takeaway from this discussion? |
| 435 | |
| 436 | 28:26 PETER: I think validating code assumptions is really valuable. So you |
| 437 | think that you're pretty smart when you're writing something, or you remember - |
| 438 | I mean, you're sometimes kind of smart when you're writing something. And |
| 439 | you're like, this can't possibly be wrong. And in practice, looking at crash |
| 440 | reports, these things are wrong all the time. So please validate any |
| 441 | assumptions that you make. It's also, I would say, better than a comment, |
| 442 | because it's a comment that doesn't get outdated without you noticing it. So, I |
| 443 | think, validate your assumptions to make sure that your code is more robust. |
| 444 | And validate properties you care about. And don't be afraid to use CHECKs. |
| 445 | |
| 446 | 29:13 SHARON: All right. That sounds like a good summary. Thank you very much |
| 447 | for being here, Peter. It was great to learn about DCHECKs. |
| 448 | |
| 449 | 29:18 PETER: Yeah. Thanks for having me. |
| 450 | |
| 451 | 29:24 SHARON: Action. Hello. |
| 452 | |
| 453 | 29:26 PETER: Oh. Take four. |
| 454 | |
| 455 | 29:29 SHARON: [LAUGHS] Take four. And action. |
Daniel Verkamp | 1de516f | 2023-10-10 22:17:09 | [diff] [blame] | 456 | |
| 457 | [What's Up With Pointers]: https://www.youtube.com/watch?v=MpwbWSEDfjM |