SlideShare a Scribd company logo
How data flow analysis
operates in a
static code analyzer
Pavel Belikov
C++ Developer, PVS-Studio
belikov@viva64.com
/ 50
PVS-Studio
• Static analyzer for C, C++, C# code
• It works on Windows, Linux, macOS
• Plugin for Visual Studio
• Integrates into SonarQube and Jenkins
• Quick start (Standalone, pvs-studio-analyzer)
2
/ 50
Contents:
• Types and objectives of Data Flow Analysis
• Analysis of conditions
• Analysis of loops
• Symbolic execution
• Examples of errors found in real projects
3
/ 50
What is data flow analysis
•Calculate a set of values for expression or its properties
•Numbers
•Null/non-null pointer
•Strings
•The size and contents of containers/optional
• Determine state of variables
4
/ 50
The main objectives
• Set of values must be a superset of real values
• Time is limited
• Number of false positives must be minimized
5
/ 50
Why do we
need it?
static const int kDaysInMonth[13] = {
0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
};
bool ValidateDateTime(const DateTime& time) {
if (time.year < 1 || time.year > 9999 ||
time.month < 1 || time.month > 12 ||
time.day < 1 || time.day > 31 ||
time.hour < 0 || time.hour > 23 ||
time.minute < 0 || time.minute > 59 ||
time.second < 0 || time.second > 59) {
return false;
}
if (time.month == 2 && IsLeapYear(time.year)) {
return time.month <= kDaysInMonth[time.month] + 1;
} else {
return time.month <= kDaysInMonth[time.month];
}
}
6
/ 50
Why do we
need it?
static const int kDaysInMonth[13] = {
0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
};
bool ValidateDateTime(const DateTime& time) {
if (time.year < 1 || time.year > 9999 ||
time.month < 1 || time.month > 12 ||
time.day < 1 || time.day > 31 ||
time.hour < 0 || time.hour > 23 ||
time.minute < 0 || time.minute > 59 ||
time.second < 0 || time.second > 59) {
return false;
}
if (time.month == 2 && IsLeapYear(time.year)) {
return time.month <= kDaysInMonth[time.month] + 1;
} else {
return time.month <= kDaysInMonth[time.month];
}
}
7
Protobuf
• V547 / CWE-571 Expression 'time.month <=
kDaysInMonth[time.month] + 1' is always true. time.cc 83
• V547 / CWE-571 Expression 'time.month <=
kDaysInMonth[time.month]' is always true. time.cc 85
/ 50
The basic equation
• b – a code block
• in/out - a state of variables when entering and exiting the block
• trans - a function that transforms the state of variables in the block
• join - a function that merges the state of variables in different paths of execution
8
/ 50
Example
int a = 3;
if (something)
{
a = 4;
}
std::cout << a;
9
/ 50
Example
int a = 3; in = {}, out = {a=3}
if (something)
a = 4; in = {a=3}, out = {a=4}
std::cout << a; in = {a=3}∪{a=4}={a=[3;4]}
10
/ 50
Flow sensitivity
• Flow-sensitive analysis depends on the order of expressions in code
• An example of a flow-insensitive analysis: searching for modified variables in a block
• A way for code traversal is needed
11
/ 50
Flow sensitivity
• Data Flow works with Control Flow Graph
• In practice you can use AST (abstract syntax tree)
• AST is simpler and more understandable to most developers
• There are more tools for AST, parsers can generate AST
• CFG can be simulated on top of the AST
12
/ 50
Flow sensitivity
• Forward analysis
• Pass the information to the block B from the preceding blocks
• It suits well for calculating the values of variables and determining reaching
definitions
• Backward analysis
• Pass the information from the block B to the preceding blocks
• It suits well for live variable analysis
13
/ 50
Example of backward analysis
__private_extern__ void
YSHA1Transform(u_int32_t state[5],
const unsigned char buffer[64])
{
u_int32_t a, b, c, d, e;
....
state[0] += a;
state[1] += b;
state[2] += c;
state[3] += d;
state[4] += e;
/* Wipe variables */
a = b = c = d = e = 0;
}
XNU kernel
V1001 CWE-563 The 'a' variable is assigned but is not used until the end of the function. sha1mod.c 120
14
/ 50
Example of forward analysis
• Reaching definitions
• REACH - a set of variable definitions that can be read in the expression S
• GEN - new definitions
• KILL - "killed" definitions
15
/ 50
Example of forward analysis
ParseResult ParseOption (string option, ref string[] args , CompilerSettings settings) {
AssemblyResource res = null; GEN={res0}
switch (s.Length) {
case 1:
res = new AssemblyResource (s[0], Path.GetFileName (s[0])); GEN={res1}, KILL={res0}
break;
case 2:
res = new AssemblyResource (s[0], s[1]); GEN={res2}, KILL={res0}
break;
default:
report.Error (-2005, "Wrong number of arguments for option '{0}'", option);
return ParseResult.Error;
}
if (res != null) { ... } REACH={res1, res2}
}
ILSpy
V3022 Expression 'res != null' is always true. settings.cs 827
16
/ 50
Must vs may
• Must
• Data flow fact must be true for all paths
• It’s expressed through the intersection of sets
• May
• Fact should be correct at least for one path
• It is expressed through the union of sets
17
/ 50
Must vs may
• Static analysis often works with may
• No one writes
int *p = nullptr;
if (something) p = nullptr;
else if (something_else) p = nullptr;
else p = nullptr;
*p = 42;
18
/ 50
Must vs may
STDMETHODIMP sdnAccessible::get_computedStyle(
BSTR __RPC_FAR* aStyleProperties,
BSTR __RPC_FAR* aStyleValues,
unsigned short __RPC_FAR* aNumStyleProperties)
{
if (!aStyleProperties || aStyleValues || !aNumStyleProperties)
return E_INVALIDARG;
....
aStyleValues[realIndex] = ::SysAllocString(value.get());
....
}
Mozilla Thunderbird
V522 Dereferencing of the null pointer ā€˜aStyleValues’ might take place. sdnaccessible.cpp 252
19
/ 50
Path-sensitive analysis
• May in one of the paths is not enough
• What if the path is impossible?
• We need to analyze the conditions!
20
/ 50
Path-sensitive analysis
enum {
Runesync = 0x80,
Runeself = 0x80,
};
char* utfrune(const char *s, int c) {
....
if (c < Runesync) return strchr(s, c); // c: then [INT_MIN; 0x79] else [0x80; INT_MAX]
for(;;) {
c1 = *(unsigned char*)s;
if (c1 < Runeself) { // c1: then [0; 0x79]
if (c1 == 0) return 0; // c1: then 0 else [1; 0x79]
if (c1 == c) return (char*)s; // if ([1; 0x79] == [0x80; INT_MAX])
....
}
....
}
return 0;
}
RE2 V547 CWE-570 Expression 'c1 == c' is always false. rune.cc 247
21
/ 50
Short circuit
if ( x >= 0 && x <= 10 ) {
} else {
}
22
/ 50
Short circuit
23
x = [0; INT_MAX]
x = [INT_MIN; -1]
if ( x >= 0 && x <= 10 ) {
} else {
}
/ 50
Short circuit
24
x = [0; INT_MAX]
x = [INT_MIN; -1]
x = [0; 10]
x = [11; INT_MAX]
then: x = [0; 10]
else: x = [INT_MIN; -1] ∪ [11; INT_MAX]
x = [0; INT_MAX]
x = [INT_MIN; -1]
if ( x >= 0 && x <= 10 ) {
} else {
}
/ 50
Short circuit
internal bool SafeForExport()
{
return DisplayEntry.SafeForExport() &&
ItemSelectionCondition == null
|| ItemSelectionCondition.SafeForExport();
}
PowerShell
V3080 Possible null dereference. Consider inspecting ā€˜ItemSelectionCondition’. System.Management.Automation
displayDescriptionData_List.cs 352
25
/ 50
Join problem
int *p;
if (condition) {
p = new int;
} else {
p = nullptr;
}
// p - nullable
if (condition) {
*p = 42; // null dereference?
}
26
/ 50
Join problem
• We lose the information when we unite the paths
• It is better to postpone the merging of states for as long as possible
• But there is a problem with path explosion
27
/ 50
Join problem
int *p;
if (condition) {
p = new int; // p = non null if condition
} else {
p = nullptr; // p = null if !condition
}
// p = non null if condition
// ∪ null if !condition
if (condition) {
// p = non null
*p = 42;
}
28
/ 50
Join problem
int arr[4];
int a, b;
if (condition) {
a = 1;
b = 2;
} else {
a = 2;
b = 1;
}
return arr[a + b]; // a = 1 if condition ∪ 2 if !condition
// b = 2 if condition ∪ 1 if !condition
// a + b = 3 if condition ∪ 3 if !condition
// a + b = 3
29
/ 50
Try-catch
30
try {
SomeClass c(someFunction(), 42);
c.foo();
return c + ā€œabcā€;
} catch (...) {
}
/ 50
Try-catch
try {
SomeClass c(someFunction(), 42);
c.foo();
return c + ā€œabcā€;
} catch (...) {
}
• call of someFunction()
• constructor of c variable
• call of foo() method
• constructors of temporary objects
• operator +
• constructor for returned object
• destructors for temporary objects
• destructor of c variable
31
/ 50
Loop analysis
•In general case, it is difficult and slow to analyze
•Analyze the first iteration separately
•"Kill" all new definitions of variables after a loop
32
Are you stuck in
an infinite loop?
YesNo
/ 50
Loop invariants
public final R getSomeBuildWithWorkspace() {
int cnt=0; // <= variable definition outside of the loop
for (R b = getLastBuild(); cnt<5 && b!=null; b=b.getPreviousBuild()) {
FilePath ws = b.getWorkspace();
if (ws!=null) return b;
}
return null;
}
Jenkins
V6022 Expression 'cnt < 5' is always true AbstractProject.java 557
33
/ 50
The first iteration
void Measure::read(XmlReader& e, int staffIdx) {
Segment* segment = 0;
....
while (e.readNextStartElement()) {
const QStringRef& tag(e.name());
if (tag == "move")
e.initTick(e.readFraction().ticks() + tick());
....
else if (tag == "sysInitBarLineType") {
....
segment = getSegmentR(SegmentType::BeginBarLine, 0); // !!!
segment->add(barLine); // <= OK
}
....
else if (tag == "Segment")
segment->read(e); // <= ERROR
....
}
}
MuseScore V522 Dereferencing of the null pointer 'segment' might take place. measure.cpp 2220
34
/ 50
Loop control flow
SkOpSpan* SkOpContour::undoneSpan() {
SkOpSegment* testSegment = &fHead;
bool allDone = true;
do {
if (testSegment->done()) {
continue;
}
allDone = false;
return testSegment->undoneSpan();
} while ((testSegment = testSegment->next()));
if (allDone) {
fDone = true;
}
return nullptr;
}
Skia Graphics Engine
V547 CWE-571 Expression 'allDone' is always true. skopcontour.cpp 43
35
/ 50
Loop control flow
SkOpSpan* SkOpContour::undoneSpan() {
SkOpSegment* testSegment = &fHead;
bool allDone = true;
do {
if (testSegment->done()) {
continue;
}
allDone = false; // <= we don’t take into account this path
return testSegment->undoneSpan();
} while ((testSegment = testSegment->next()));
if (allDone) {
fDone = true;
}
return nullptr;
}
Skia Graphics Engine
V547 CWE-571 Expression 'allDone' is always true. skopcontour.cpp 43
36
/ 50
Loop counter analysis
for (int i = 0; i < 10; ++i)
{
// i = [INT_MIN; 9] ?
// i = [0; 9] !!!
}
37
/ 50
Loop counter analysis
#define AE_IDLE_TIMEOUT 100
static void
ae_stop_rxmac(ae_softc_t *sc)
{
int i;
....
/*
* Wait for IDLE state.
*/
for (i = 0; i < AE_IDLE_TIMEOUT; i--) { // <=
val = AE_READ_4(sc, AE_IDLE_REG);
if ((val & (AE_IDLE_RXMAC | AE_IDLE_DMAWRITE)) == 0)
break;
DELAY(100);
}
....
}
FreeBSD Kernel
V621 Consider inspecting the 'for' operator. It's possible that the loop will be executed incorrectly or won't be executed at all. if_ae.c 1663
38
/ 50
There is a problem
for (int i = 0; i < n; ++i) {
for (int j = i + 1; j < n; ++j) {
// j - i
}
}
39
/ 50
There is a problem
int i = /* [0; 42] */;
int j = i + 1; // [1; 43]
int r = j - i; // [-43; 41]???
40
/ 50
Symbolic execution
int i = /* [0; 42] */;
int j = i + 1; // [1; 43]
int r = j - i; // i + 1 - i = 1
41
/ 50
Symbolic execution
• Calculate everything in symbolic expressions
• Create a system of equations
• Upload it into SMT solver
• ???
• PROFIT
42
/ 50
Symbolic execution
public static MMMethodKind valueOf(....) {
MMMethodKind result = OTHER;
for (MMMethodKind k : values()) {
if (k.detector.test(method) && result.level < k.level) {
if (result.level == k.level) {
throw new SpoonException(....);
}
result = k;
}
}
return result;
}
Spoon
MMMethodKind.java:129: V6007 Expression 'result.level == k.level' is always false.
43
/ 50
Context sensitive
•foo();
•We can reset all of the accumulated information
•Annotate popular libraries
•Enjoy the 10 ways to pass a variable to a function
44
/ 50
Context sensitive
•analysis of a function considering the context of the caller
•scales poorly
•useful for analyzing small functions (getters/setters, for example)
45
/ 50
Context sensitive
void foo(int *p) { // analyze two times
*p = 42;
}
void bar() {
int *p = something ? new int : nullptr;
foo(p); // repeatedly analyze foo and find a bug
}
46
/ 50
Context insensitive
void foo(int *p) { // p != nullptr
*p = 42;
}
void bar() {
int *p = something ? new int : nullptr;
foo(p); // p != nullptr contract is violated, found a bug
}
47
/ 50
Context insensitive
• Analyze the body of a function, compose an annotation for it
• Contract for arguments
• Presence of a global state
• Returned value
• And much more
• contracts proposal
void foo(const std::vector<int> &indices)
[[expects: !indices.empty()]];
48
/ 50
Conclusions
• Data flow analysis is a useful technique for finding errors
• To find bugs one has to operate large and sometimes strange set of properties
• The combination of different techniques allows to increase the reliability of analysis
results
• Various heuristics and assumptions allow finding more bugs
• Every significant static analyzer must use data flow analysis
49
/ 50
Answering your questions
PVS-Studio: http://www.viva64.com/ru/pvs-studio/
50

More Related Content

What's hot (20)

PDF
02 teknik penyerangan
Setia Juli Irzal Ismail
Ā 
PPT
An atm with an iris recognition
mahesh123slideshre
Ā 
PPTX
Giį»›i thiệu về Arduino - Arduino360
Hį»c Tį»±
Ā 
PPTX
Face recognition technology
ShubhamLamichane
Ā 
PPTX
Packet Sniffer
vilss
Ā 
PDF
Intrusion Detection System Project Report
Raghav Bisht
Ā 
DOC
Luận văn ThẔc sĩ Nghiên cứu cÔc kỹ thuật của IoT và cÔc ứng dỄng của nó cho n...
Dịch vỄ viįŗæt thuĆŖ Luįŗ­n Văn - ZALO 0932091562
Ā 
PPTX
Introduction to raspberry pi
ė™ķ˜ø 손
Ā 
PPT
Ipv6
khacthang
Ā 
DOCX
Đề tĆ i: Nhįŗ­n dįŗ”ng mįŗ·t ngĘ°į»i trĆŖn matlab, HOT, 9đ
Dịch VỄ Viįŗæt BĆ i Trį»n Gói ZALO 0917193864
Ā 
PDF
Ł…Ų±Ų§Ų­Ł„ ؄نتاج Ų§Ł„Ł…Ų­ŲŖŁˆŁ‰ الرقمى.pdf
NadiaMohamedSherif
Ā 
PDF
04 sniffing
Setia Juli Irzal Ismail
Ā 
PDF
Voice Assistant (1).pdf
Thakurpushpendersing
Ā 
PPTX
SMART ATTENDANCE SYSTEM USING FACE RECOGNITION (233.pptx
BikashUpadhaya1
Ā 
PPT
Intrusion Detection System
Mohit Belwal
Ā 
PDF
Luận văn: Nhận dẔng và phân loẔi hoa quả trong ảnh màu, HAY
Dịch vỄ viįŗæt bĆ i trį»n gói ZALO 0917193864
Ā 
DOCX
capaian TJKT VERSI 2.docx
JokoSR1
Ā 
DOCX
Laporan Praktikum Modul 2 (Instalasi Windows)
Faisal Amir
Ā 
PPTX
Phʰʔng phĆ”p tham lam giįŗ£i bĆ i toĆ”n lįŗ­p lịch cĆ“ng việc
Nguyį»…n Danh Thanh
Ā 
PPTX
Metasploit
Lalith Sai
Ā 
02 teknik penyerangan
Setia Juli Irzal Ismail
Ā 
An atm with an iris recognition
mahesh123slideshre
Ā 
Giį»›i thiệu về Arduino - Arduino360
Hį»c Tį»±
Ā 
Face recognition technology
ShubhamLamichane
Ā 
Packet Sniffer
vilss
Ā 
Intrusion Detection System Project Report
Raghav Bisht
Ā 
Luận văn ThẔc sĩ Nghiên cứu cÔc kỹ thuật của IoT và cÔc ứng dỄng của nó cho n...
Dịch vỄ viįŗæt thuĆŖ Luįŗ­n Văn - ZALO 0932091562
Ā 
Introduction to raspberry pi
ė™ķ˜ø 손
Ā 
Ipv6
khacthang
Ā 
Đề tĆ i: Nhįŗ­n dįŗ”ng mįŗ·t ngĘ°į»i trĆŖn matlab, HOT, 9đ
Dịch VỄ Viįŗæt BĆ i Trį»n Gói ZALO 0917193864
Ā 
Ł…Ų±Ų§Ų­Ł„ ؄نتاج Ų§Ł„Ł…Ų­ŲŖŁˆŁ‰ الرقمى.pdf
NadiaMohamedSherif
Ā 
Voice Assistant (1).pdf
Thakurpushpendersing
Ā 
SMART ATTENDANCE SYSTEM USING FACE RECOGNITION (233.pptx
BikashUpadhaya1
Ā 
Intrusion Detection System
Mohit Belwal
Ā 
Luận văn: Nhận dẔng và phân loẔi hoa quả trong ảnh màu, HAY
Dịch vỄ viįŗæt bĆ i trį»n gói ZALO 0917193864
Ā 
capaian TJKT VERSI 2.docx
JokoSR1
Ā 
Laporan Praktikum Modul 2 (Instalasi Windows)
Faisal Amir
Ā 
Phʰʔng phĆ”p tham lam giįŗ£i bĆ i toĆ”n lįŗ­p lịch cĆ“ng việc
Nguyį»…n Danh Thanh
Ā 
Metasploit
Lalith Sai
Ā 

Similar to How Data Flow analysis works in a static code analyzer (20)

PDF
Technologies used in the PVS-Studio code analyzer for finding bugs and potent...
Andrey Karpov
Ā 
PPTX
Story of static code analyzer development
Andrey Karpov
Ā 
PDF
Analysis of Microsoft Code Contracts
PVS-Studio
Ā 
PPTX
PVS-Studio features overview (2020)
Andrey Karpov
Ā 
PPTX
Detection of errors and potential vulnerabilities in C and C++ code using the...
Andrey Karpov
Ā 
PPTX
Static code analysis: what? how? why?
Andrey Karpov
Ā 
PPTX
Update on C++ Core Guidelines Lifetime Analysis. GƔbor HorvƔth. CoreHard Spri...
corehard_by
Ā 
PPTX
What static analyzers can do that programmers and testers cannot
Andrey Karpov
Ā 
PDF
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 1
PVS-Studio
Ā 
PDF
Stale pointers are the new black - white paper
Vincenzo Iozzo
Ā 
PDF
Dataflow Analysis
Eelco Visser
Ā 
PPTX
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Silvio Cesare
Ā 
PPTX
Static analysis works for mission-critical systems, why not yours?
Rogue Wave Software
Ā 
PPT
Code Analysis-run time error prediction
NIKHIL NAWATHE
Ā 
PPTX
The operation principles of PVS-Studio static code analyzer
Andrey Karpov
Ā 
PDF
A Novel Analysis Space For Pointer Analysis And Its Application For Bug Finding
Scott Donald
Ā 
PDF
Porting is a Delicate Matter: Checking Far Manager under Linux
PVS-Studio
Ā 
PDF
Checking the code of Valgrind dynamic analyzer by a static analyzer
PVS-Studio
Ā 
PDF
Static and Dynamic Code Analysis
Andrey Karpov
Ā 
PDF
Asterisk: PVS-Studio Takes Up Telephony
Andrey Karpov
Ā 
Technologies used in the PVS-Studio code analyzer for finding bugs and potent...
Andrey Karpov
Ā 
Story of static code analyzer development
Andrey Karpov
Ā 
Analysis of Microsoft Code Contracts
PVS-Studio
Ā 
PVS-Studio features overview (2020)
Andrey Karpov
Ā 
Detection of errors and potential vulnerabilities in C and C++ code using the...
Andrey Karpov
Ā 
Static code analysis: what? how? why?
Andrey Karpov
Ā 
Update on C++ Core Guidelines Lifetime Analysis. GƔbor HorvƔth. CoreHard Spri...
corehard_by
Ā 
What static analyzers can do that programmers and testers cannot
Andrey Karpov
Ā 
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 1
PVS-Studio
Ā 
Stale pointers are the new black - white paper
Vincenzo Iozzo
Ā 
Dataflow Analysis
Eelco Visser
Ā 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Silvio Cesare
Ā 
Static analysis works for mission-critical systems, why not yours?
Rogue Wave Software
Ā 
Code Analysis-run time error prediction
NIKHIL NAWATHE
Ā 
The operation principles of PVS-Studio static code analyzer
Andrey Karpov
Ā 
A Novel Analysis Space For Pointer Analysis And Its Application For Bug Finding
Scott Donald
Ā 
Porting is a Delicate Matter: Checking Far Manager under Linux
PVS-Studio
Ā 
Checking the code of Valgrind dynamic analyzer by a static analyzer
PVS-Studio
Ā 
Static and Dynamic Code Analysis
Andrey Karpov
Ā 
Asterisk: PVS-Studio Takes Up Telephony
Andrey Karpov
Ā 
Ad

More from Andrey Karpov (20)

PDF
60 антипаттернов Š“Š»Ń Š”++ программиста
Andrey Karpov
Ā 
PDF
60 terrible tips for a C++ developer
Andrey Karpov
Ā 
PPTX
ŠžŃˆŠøŠ±ŠŗŠø, которые сложно Š·Š°Š¼ŠµŃ‚ŠøŃ‚ŃŒ на code review, но которые Š½Š°Ń…Š¾Š“ŃŃ‚ŃŃ статичес...
Andrey Karpov
Ā 
PDF
PVS-Studio in 2021 - Error Examples
Andrey Karpov
Ā 
PDF
PVS-Studio in 2021 - Feature Overview
Andrey Karpov
Ā 
PDF
PVS-Studio в 2021 - ŠŸŃ€ŠøŠ¼ŠµŃ€Ń‹ ошибок
Andrey Karpov
Ā 
PDF
PVS-Studio в 2021
Andrey Karpov
Ā 
PPTX
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
Andrey Karpov
Ā 
PPTX
Best Bugs from Games: Fellow Programmers' Mistakes
Andrey Karpov
Ā 
PPTX
Does static analysis need machine learning?
Andrey Karpov
Ā 
PPTX
Typical errors in code on the example of C++, C#, and Java
Andrey Karpov
Ā 
PPTX
How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)
Andrey Karpov
Ā 
PPTX
Game Engine Code Quality: Is Everything Really That Bad?
Andrey Karpov
Ā 
PPTX
C++ Code as Seen by a Hypercritical Reviewer
Andrey Karpov
Ā 
PPTX
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
Andrey Karpov
Ā 
PPTX
Static Code Analysis for Projects, Built on Unreal Engine
Andrey Karpov
Ā 
PPTX
Safety on the Max: How to Write Reliable C/C++ Code for Embedded Systems
Andrey Karpov
Ā 
PPTX
The Great and Mighty C++
Andrey Karpov
Ā 
PDF
Zero, one, two, Freddy's coming for you
Andrey Karpov
Ā 
PDF
PVS-Studio Is Now in Chocolatey: Checking Chocolatey under Azure DevOps
Andrey Karpov
Ā 
60 антипаттернов Š“Š»Ń Š”++ программиста
Andrey Karpov
Ā 
60 terrible tips for a C++ developer
Andrey Karpov
Ā 
ŠžŃˆŠøŠ±ŠŗŠø, которые сложно Š·Š°Š¼ŠµŃ‚ŠøŃ‚ŃŒ на code review, но которые Š½Š°Ń…Š¾Š“ŃŃ‚ŃŃ статичес...
Andrey Karpov
Ā 
PVS-Studio in 2021 - Error Examples
Andrey Karpov
Ā 
PVS-Studio in 2021 - Feature Overview
Andrey Karpov
Ā 
PVS-Studio в 2021 - ŠŸŃ€ŠøŠ¼ŠµŃ€Ń‹ ошибок
Andrey Karpov
Ā 
PVS-Studio в 2021
Andrey Karpov
Ā 
Make Your and Other Programmer’s Life Easier with Static Analysis (Unreal Eng...
Andrey Karpov
Ā 
Best Bugs from Games: Fellow Programmers' Mistakes
Andrey Karpov
Ā 
Does static analysis need machine learning?
Andrey Karpov
Ā 
Typical errors in code on the example of C++, C#, and Java
Andrey Karpov
Ā 
How to Fix Hundreds of Bugs in Legacy Code and Not Die (Unreal Engine 4)
Andrey Karpov
Ā 
Game Engine Code Quality: Is Everything Really That Bad?
Andrey Karpov
Ā 
C++ Code as Seen by a Hypercritical Reviewer
Andrey Karpov
Ā 
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
Andrey Karpov
Ā 
Static Code Analysis for Projects, Built on Unreal Engine
Andrey Karpov
Ā 
Safety on the Max: How to Write Reliable C/C++ Code for Embedded Systems
Andrey Karpov
Ā 
The Great and Mighty C++
Andrey Karpov
Ā 
Zero, one, two, Freddy's coming for you
Andrey Karpov
Ā 
PVS-Studio Is Now in Chocolatey: Checking Chocolatey under Azure DevOps
Andrey Karpov
Ā 
Ad

Recently uploaded (20)

PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
Ā 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
Ā 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
Ā 
PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
Ā 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
Ā 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
Ā 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
Ā 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
Ā 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
Ā 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
Ā 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
Ā 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
Ā 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
Ā 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
Ā 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
Ā 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
Ā 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
Ā 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
Ā 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
Ā 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
Ā 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
Ā 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
Ā 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
Ā 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
Ā 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
Ā 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
Ā 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
Ā 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
Ā 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
Ā 
Human Resources Information System (HRIS)
Amity University, Patna
Ā 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
Ā 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
Ā 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
Ā 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
Ā 
Tally software_Introduction_Presentation
AditiBansal54083
Ā 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
Ā 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
Ā 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
Ā 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
Ā 
Engineering the Java Web Application (MVC)
abhishekoza1981
Ā 

How Data Flow analysis works in a static code analyzer

  • 1. How data flow analysis operates in a static code analyzer Pavel Belikov C++ Developer, PVS-Studio [email protected]
  • 2. / 50 PVS-Studio • Static analyzer for C, C++, C# code • It works on Windows, Linux, macOS • Plugin for Visual Studio • Integrates into SonarQube and Jenkins • Quick start (Standalone, pvs-studio-analyzer) 2
  • 3. / 50 Contents: • Types and objectives of Data Flow Analysis • Analysis of conditions • Analysis of loops • Symbolic execution • Examples of errors found in real projects 3
  • 4. / 50 What is data flow analysis •Calculate a set of values for expression or its properties •Numbers •Null/non-null pointer •Strings •The size and contents of containers/optional • Determine state of variables 4
  • 5. / 50 The main objectives • Set of values must be a superset of real values • Time is limited • Number of false positives must be minimized 5
  • 6. / 50 Why do we need it? static const int kDaysInMonth[13] = { 0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }; bool ValidateDateTime(const DateTime& time) { if (time.year < 1 || time.year > 9999 || time.month < 1 || time.month > 12 || time.day < 1 || time.day > 31 || time.hour < 0 || time.hour > 23 || time.minute < 0 || time.minute > 59 || time.second < 0 || time.second > 59) { return false; } if (time.month == 2 && IsLeapYear(time.year)) { return time.month <= kDaysInMonth[time.month] + 1; } else { return time.month <= kDaysInMonth[time.month]; } } 6
  • 7. / 50 Why do we need it? static const int kDaysInMonth[13] = { 0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }; bool ValidateDateTime(const DateTime& time) { if (time.year < 1 || time.year > 9999 || time.month < 1 || time.month > 12 || time.day < 1 || time.day > 31 || time.hour < 0 || time.hour > 23 || time.minute < 0 || time.minute > 59 || time.second < 0 || time.second > 59) { return false; } if (time.month == 2 && IsLeapYear(time.year)) { return time.month <= kDaysInMonth[time.month] + 1; } else { return time.month <= kDaysInMonth[time.month]; } } 7 Protobuf • V547 / CWE-571 Expression 'time.month <= kDaysInMonth[time.month] + 1' is always true. time.cc 83 • V547 / CWE-571 Expression 'time.month <= kDaysInMonth[time.month]' is always true. time.cc 85
  • 8. / 50 The basic equation • b – a code block • in/out - a state of variables when entering and exiting the block • trans - a function that transforms the state of variables in the block • join - a function that merges the state of variables in different paths of execution 8
  • 9. / 50 Example int a = 3; if (something) { a = 4; } std::cout << a; 9
  • 10. / 50 Example int a = 3; in = {}, out = {a=3} if (something) a = 4; in = {a=3}, out = {a=4} std::cout << a; in = {a=3}∪{a=4}={a=[3;4]} 10
  • 11. / 50 Flow sensitivity • Flow-sensitive analysis depends on the order of expressions in code • An example of a flow-insensitive analysis: searching for modified variables in a block • A way for code traversal is needed 11
  • 12. / 50 Flow sensitivity • Data Flow works with Control Flow Graph • In practice you can use AST (abstract syntax tree) • AST is simpler and more understandable to most developers • There are more tools for AST, parsers can generate AST • CFG can be simulated on top of the AST 12
  • 13. / 50 Flow sensitivity • Forward analysis • Pass the information to the block B from the preceding blocks • It suits well for calculating the values of variables and determining reaching definitions • Backward analysis • Pass the information from the block B to the preceding blocks • It suits well for live variable analysis 13
  • 14. / 50 Example of backward analysis __private_extern__ void YSHA1Transform(u_int32_t state[5], const unsigned char buffer[64]) { u_int32_t a, b, c, d, e; .... state[0] += a; state[1] += b; state[2] += c; state[3] += d; state[4] += e; /* Wipe variables */ a = b = c = d = e = 0; } XNU kernel V1001 CWE-563 The 'a' variable is assigned but is not used until the end of the function. sha1mod.c 120 14
  • 15. / 50 Example of forward analysis • Reaching definitions • REACH - a set of variable definitions that can be read in the expression S • GEN - new definitions • KILL - "killed" definitions 15
  • 16. / 50 Example of forward analysis ParseResult ParseOption (string option, ref string[] args , CompilerSettings settings) { AssemblyResource res = null; GEN={res0} switch (s.Length) { case 1: res = new AssemblyResource (s[0], Path.GetFileName (s[0])); GEN={res1}, KILL={res0} break; case 2: res = new AssemblyResource (s[0], s[1]); GEN={res2}, KILL={res0} break; default: report.Error (-2005, "Wrong number of arguments for option '{0}'", option); return ParseResult.Error; } if (res != null) { ... } REACH={res1, res2} } ILSpy V3022 Expression 'res != null' is always true. settings.cs 827 16
  • 17. / 50 Must vs may • Must • Data flow fact must be true for all paths • It’s expressed through the intersection of sets • May • Fact should be correct at least for one path • It is expressed through the union of sets 17
  • 18. / 50 Must vs may • Static analysis often works with may • No one writes int *p = nullptr; if (something) p = nullptr; else if (something_else) p = nullptr; else p = nullptr; *p = 42; 18
  • 19. / 50 Must vs may STDMETHODIMP sdnAccessible::get_computedStyle( BSTR __RPC_FAR* aStyleProperties, BSTR __RPC_FAR* aStyleValues, unsigned short __RPC_FAR* aNumStyleProperties) { if (!aStyleProperties || aStyleValues || !aNumStyleProperties) return E_INVALIDARG; .... aStyleValues[realIndex] = ::SysAllocString(value.get()); .... } Mozilla Thunderbird V522 Dereferencing of the null pointer ā€˜aStyleValues’ might take place. sdnaccessible.cpp 252 19
  • 20. / 50 Path-sensitive analysis • May in one of the paths is not enough • What if the path is impossible? • We need to analyze the conditions! 20
  • 21. / 50 Path-sensitive analysis enum { Runesync = 0x80, Runeself = 0x80, }; char* utfrune(const char *s, int c) { .... if (c < Runesync) return strchr(s, c); // c: then [INT_MIN; 0x79] else [0x80; INT_MAX] for(;;) { c1 = *(unsigned char*)s; if (c1 < Runeself) { // c1: then [0; 0x79] if (c1 == 0) return 0; // c1: then 0 else [1; 0x79] if (c1 == c) return (char*)s; // if ([1; 0x79] == [0x80; INT_MAX]) .... } .... } return 0; } RE2 V547 CWE-570 Expression 'c1 == c' is always false. rune.cc 247 21
  • 22. / 50 Short circuit if ( x >= 0 && x <= 10 ) { } else { } 22
  • 23. / 50 Short circuit 23 x = [0; INT_MAX] x = [INT_MIN; -1] if ( x >= 0 && x <= 10 ) { } else { }
  • 24. / 50 Short circuit 24 x = [0; INT_MAX] x = [INT_MIN; -1] x = [0; 10] x = [11; INT_MAX] then: x = [0; 10] else: x = [INT_MIN; -1] ∪ [11; INT_MAX] x = [0; INT_MAX] x = [INT_MIN; -1] if ( x >= 0 && x <= 10 ) { } else { }
  • 25. / 50 Short circuit internal bool SafeForExport() { return DisplayEntry.SafeForExport() && ItemSelectionCondition == null || ItemSelectionCondition.SafeForExport(); } PowerShell V3080 Possible null dereference. Consider inspecting ā€˜ItemSelectionCondition’. System.Management.Automation displayDescriptionData_List.cs 352 25
  • 26. / 50 Join problem int *p; if (condition) { p = new int; } else { p = nullptr; } // p - nullable if (condition) { *p = 42; // null dereference? } 26
  • 27. / 50 Join problem • We lose the information when we unite the paths • It is better to postpone the merging of states for as long as possible • But there is a problem with path explosion 27
  • 28. / 50 Join problem int *p; if (condition) { p = new int; // p = non null if condition } else { p = nullptr; // p = null if !condition } // p = non null if condition // ∪ null if !condition if (condition) { // p = non null *p = 42; } 28
  • 29. / 50 Join problem int arr[4]; int a, b; if (condition) { a = 1; b = 2; } else { a = 2; b = 1; } return arr[a + b]; // a = 1 if condition ∪ 2 if !condition // b = 2 if condition ∪ 1 if !condition // a + b = 3 if condition ∪ 3 if !condition // a + b = 3 29
  • 30. / 50 Try-catch 30 try { SomeClass c(someFunction(), 42); c.foo(); return c + ā€œabcā€; } catch (...) { }
  • 31. / 50 Try-catch try { SomeClass c(someFunction(), 42); c.foo(); return c + ā€œabcā€; } catch (...) { } • call of someFunction() • constructor of c variable • call of foo() method • constructors of temporary objects • operator + • constructor for returned object • destructors for temporary objects • destructor of c variable 31
  • 32. / 50 Loop analysis •In general case, it is difficult and slow to analyze •Analyze the first iteration separately •"Kill" all new definitions of variables after a loop 32 Are you stuck in an infinite loop? YesNo
  • 33. / 50 Loop invariants public final R getSomeBuildWithWorkspace() { int cnt=0; // <= variable definition outside of the loop for (R b = getLastBuild(); cnt<5 && b!=null; b=b.getPreviousBuild()) { FilePath ws = b.getWorkspace(); if (ws!=null) return b; } return null; } Jenkins V6022 Expression 'cnt < 5' is always true AbstractProject.java 557 33
  • 34. / 50 The first iteration void Measure::read(XmlReader& e, int staffIdx) { Segment* segment = 0; .... while (e.readNextStartElement()) { const QStringRef& tag(e.name()); if (tag == "move") e.initTick(e.readFraction().ticks() + tick()); .... else if (tag == "sysInitBarLineType") { .... segment = getSegmentR(SegmentType::BeginBarLine, 0); // !!! segment->add(barLine); // <= OK } .... else if (tag == "Segment") segment->read(e); // <= ERROR .... } } MuseScore V522 Dereferencing of the null pointer 'segment' might take place. measure.cpp 2220 34
  • 35. / 50 Loop control flow SkOpSpan* SkOpContour::undoneSpan() { SkOpSegment* testSegment = &fHead; bool allDone = true; do { if (testSegment->done()) { continue; } allDone = false; return testSegment->undoneSpan(); } while ((testSegment = testSegment->next())); if (allDone) { fDone = true; } return nullptr; } Skia Graphics Engine V547 CWE-571 Expression 'allDone' is always true. skopcontour.cpp 43 35
  • 36. / 50 Loop control flow SkOpSpan* SkOpContour::undoneSpan() { SkOpSegment* testSegment = &fHead; bool allDone = true; do { if (testSegment->done()) { continue; } allDone = false; // <= we don’t take into account this path return testSegment->undoneSpan(); } while ((testSegment = testSegment->next())); if (allDone) { fDone = true; } return nullptr; } Skia Graphics Engine V547 CWE-571 Expression 'allDone' is always true. skopcontour.cpp 43 36
  • 37. / 50 Loop counter analysis for (int i = 0; i < 10; ++i) { // i = [INT_MIN; 9] ? // i = [0; 9] !!! } 37
  • 38. / 50 Loop counter analysis #define AE_IDLE_TIMEOUT 100 static void ae_stop_rxmac(ae_softc_t *sc) { int i; .... /* * Wait for IDLE state. */ for (i = 0; i < AE_IDLE_TIMEOUT; i--) { // <= val = AE_READ_4(sc, AE_IDLE_REG); if ((val & (AE_IDLE_RXMAC | AE_IDLE_DMAWRITE)) == 0) break; DELAY(100); } .... } FreeBSD Kernel V621 Consider inspecting the 'for' operator. It's possible that the loop will be executed incorrectly or won't be executed at all. if_ae.c 1663 38
  • 39. / 50 There is a problem for (int i = 0; i < n; ++i) { for (int j = i + 1; j < n; ++j) { // j - i } } 39
  • 40. / 50 There is a problem int i = /* [0; 42] */; int j = i + 1; // [1; 43] int r = j - i; // [-43; 41]??? 40
  • 41. / 50 Symbolic execution int i = /* [0; 42] */; int j = i + 1; // [1; 43] int r = j - i; // i + 1 - i = 1 41
  • 42. / 50 Symbolic execution • Calculate everything in symbolic expressions • Create a system of equations • Upload it into SMT solver • ??? • PROFIT 42
  • 43. / 50 Symbolic execution public static MMMethodKind valueOf(....) { MMMethodKind result = OTHER; for (MMMethodKind k : values()) { if (k.detector.test(method) && result.level < k.level) { if (result.level == k.level) { throw new SpoonException(....); } result = k; } } return result; } Spoon MMMethodKind.java:129: V6007 Expression 'result.level == k.level' is always false. 43
  • 44. / 50 Context sensitive •foo(); •We can reset all of the accumulated information •Annotate popular libraries •Enjoy the 10 ways to pass a variable to a function 44
  • 45. / 50 Context sensitive •analysis of a function considering the context of the caller •scales poorly •useful for analyzing small functions (getters/setters, for example) 45
  • 46. / 50 Context sensitive void foo(int *p) { // analyze two times *p = 42; } void bar() { int *p = something ? new int : nullptr; foo(p); // repeatedly analyze foo and find a bug } 46
  • 47. / 50 Context insensitive void foo(int *p) { // p != nullptr *p = 42; } void bar() { int *p = something ? new int : nullptr; foo(p); // p != nullptr contract is violated, found a bug } 47
  • 48. / 50 Context insensitive • Analyze the body of a function, compose an annotation for it • Contract for arguments • Presence of a global state • Returned value • And much more • contracts proposal void foo(const std::vector<int> &indices) [[expects: !indices.empty()]]; 48
  • 49. / 50 Conclusions • Data flow analysis is a useful technique for finding errors • To find bugs one has to operate large and sometimes strange set of properties • The combination of different techniques allows to increase the reliability of analysis results • Various heuristics and assumptions allow finding more bugs • Every significant static analyzer must use data flow analysis 49
  • 50. / 50 Answering your questions PVS-Studio: http://www.viva64.com/ru/pvs-studio/ 50