Locking in Linux kernel
Galois @ USTC Linux Users Group
zyf11@mail.ustc.edu.cn
Slides are powered by
OpenOffice.org+Linux+GNU+X31-150$
Copyright © 2005 Galois Y.F. Zheng
Permissions is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation
License, Version 1.2 or any later version published by the
Free Software Foundation; with no Invariant Sections, no
Front-Cover Texts, and no Back-Cover Texts. A copy of the
license can be downloaded from GNU's home:
http://www.gnu.org/licenses/fdl.txt
Locking in Linux kernel
• OS review: Kernel Control Paths
• Locking in Linux
• Locking and Coding
• Conclusions
OS review: Kernel Control Paths
• CPU is stupid, running endlessly!
• But the codes in RAM is intelligent.
(They compete for CPU resources.)
• We people control the codes.
cpu, operating system, and human
2
1
3
4
2nd floor
1
3
4
2
“CPU” running endlessly
endless rotating elevator: CPU
X : Codes 1
X : Codes 2
entrance
exit
1st floor
Human
2 31 4...
: Kernel Control Path 1
1 2 3 4... : Kernel Control Path 2
OS review: Kernel Control Paths
• Interrupt handlers
• Exception handlers
• User-space threads in kernel(system calls)
• Kernel threads(idle, work queue, pdflush…)
• Bottom halves(soft irq, tasklet,BH...)
Pre-KCP: What is Kernel Control Path ?
Post-KCP: Is mm subsystem a KCP?
NO, But mm codes are called by KCP.
OS review: Kernel Control Paths
• Kernel Control Paths(KCP).
• Kernel Data (global or local).
• Kernel Codes called by the KCPs.
• Bootstrap Codes, Initialization Codes, ...
What is the composition of a kernel?
Now we need Locking (between KCPs), Let's GO!
Locking in Linux kernel
• OS review: Kernel Control Paths
• Locking in Linux
• Locking and Coding
• Conclusions
Locking in Linux
What is Locking? A Simple example.
KCP 1:
load i; // i = 0
...
inc i;
store i; // i = 1
KCP 2:
load i; // i = 0
inc i;
...
store i; // i = 1
The result is wrong because of accessing “i” at the same time.
int i = 0
i = 1, wrong
Locking in Linux
What is Locking? A Simple example. (cont.)
KCP 1:
<locking starts>
load i; // i = 0
inc i;
store i; // i = 1
<locking ends>
KCP 2:
<locking starts>... failed.
waiting..
waiting..
<locking succeeds>
load i; //now i = 1
inc i;
store i; // i = 2
<locking ends>
int i = 0
i = 2, right!
x86 lock directive.:)
Locking in Linux
Another example...
Heart has no locking: a disordered world.
Give me
your heart.
Give me
your heart.
My heart
will panic.
!
!!
!
Locking: the world is well-ordered.
Give me
your heart.
and lock it..
Give me
your heart.
My heart will
go on with you
Sigh...
Locking failed
Waiting...
Sorry.
I has been locked.
xixi ...
Locking succeeds.
Locking: the world is well-ordered.
Game over!
Give me
your heart.
Now I can
Lock you.
now I'm ok.
Locking releases.
Now we know that
locking
makes the world romantic and beautiful.
Locking in Linux
• What is Locking. (cont.)
– Shared Data
– Does Code need locking ? - Yes, surprising!
• Critical Regions
– Concurrency caused by Kernel Control Paths
• Race Conditions ? - Must be avoided.
– Now we needs Synchronization . - Locking.
Locking in Linux
KCP 1
• Locking the queue
• Succeeded: acquired lock
• Access queue
• Unlock the queue
KCP 2
• Locking the queue
• Failed: waiting…
• Waiting…
• …
• Succeeded: acquired lock
• Access queue
• Unlock the queue.
Cited from LKD by R. Love
Concurrency and Locking : another example.
Locking in Linux
• Interrupts and Exceptions.
• Sleeping and synchronization.
• Kernel Preemption.
• SMP ! - a hot topic, even in embedded apps.
What Causes Concurrency?
Locking in Linux
Locking(Sync.) is important.
Let's get into the Locking details.
Locking in Linux
• 1 Atomics operations
• 2 Memory barriers
• 3 Spin locks
• 4 Reader-writer spin locks
• 5 Semaphores
• 6 Reader-writer semaphores
Various Locking mechanisms.
Locking in Linux
– 7 Condition(Completion) Variables
– 8 Sequence locks
– 9 Mask Interrupts(local and global)
– 10 Mask Bottom Halves
– 11 Disable Kernel Preemption
– 12 Read-Copy Update
–
– Big Kernel Lock - Historical, will be removed
– FUTEX ? - NO
Various Locking mechanisms. (cont.)
1 Atomics Operations
• atomic ops is for the concurrency caused by MP. not for other
concurrencies caused by preemption, sleep...
• atomic operations mechanisms(SMP env):
– cpu guaranteed atomic ops: read/write a byte, alined word...
– lock prefix: add, adc, and, cmpxchg, cmpxch8b, dec, inc, neg, not,
or, sbb, sub ,xor, xadd, btc,bts, btr
– xchg is automatically added lock prefix.
– cache coherency protocols.
1 Atomics Operations
<asm/atomic.h> <asm/bitops.h>
• atomic integer ops: (on atomic_t type v)
– atomic_read(v) v->counter not necessary
– atomic_set(v) v->counter not necessary
– atomic_add(i,v) v->counter+i lock;addl %1,%0
– ...
• atomic bitwise ops:
– set_bit(i,addr) set the i-th bit lock;btsl %1,%0
– clear_bit(i,addr) clear the i-th bit lock;btrl %1,%0
– test_and_set_bit lock;btsl %2,%1;sbbl %1,%0
– ...
• pseudo atomic bitwise ops: carefully!
– __set_bit(), __xxx() ..... there is no lock prefix.
2 Memory Barriers basics
• gcc optimizes instruction streams.
• 386 is strong ordering, where read and write are issued on the system
bus in the order they occur..but pentium 4 is processor ordering, by
which cpu could improve performance.
• memory barriers hardware technologies(x86):
– serializing instructions
• mov(to control register/debug register), wrmsr, invd, invlpg,
wbinvd,lgdt,lldt,lidt,ltr;
• cpuid,iret,rsm (non-previledged)
• sfence(store), mfence(all), lfence(load) (non-preveledged)
– io instructions, read/write to uncached memory, interrupt ocurrence,
lock prefix
– mtrr and pat could control memory ordering.
2 Memory Barriers Methods
<asm/system.h>
• rmb(), prevents loads being reordered
• read_barrier_depends(), prevents data-dependent loads being reordered.
• wmb(), prevents stores being reordered.
• mb(), prevents loads and stores being reordered.
•
• barrier(), prevents GCC optimize loads and stores.
•
• smp_xxx(), on smp, provides xxx; on up provides barrier()
Note: “xxx” refers to rmb, wmb...
3 Spin locks
<linux/spinlock.h><asm/spinlock.h>
• Spinning on SMP. Spinning is null on UP.
• Don't hold it for a long time. less than context
switch time.
• spinlock automatically disables preemption, which
avoids deadlock caused by interrupts.
• when data is shared with interrupt handler, before
holding spinlock we must disable interrupts.
• when data is shared with bottom halves, before
holding spinlock we must disable bottom halves.
4 Spin Locks
(cont.)
• spin_lock() acquire lock
• spin_unlock() release lock
• spin_lock_irq() disable local interrupts and acquire lock
• spin_unlock_irq()
• spin_lock_irqsave() save current state of ints, ...
• spin_lock_irqrestore() restore....
• ...
5 Reader-writer spin locks
<asm/spinlock.h><linux/spinlock.h>
• Writing demands mutual exclusion.
• Multiple concurrent Readings is ok.
• When Reading, Writing must be disabled.
•
• Reading locks and writing locks are seperated.
• read_lock_xxx() read_unlock_xxx()
• write_lock_xxx() write_unlock_xxx()
• ... ...
• Problems: This locks favor readers over writers, which may
starve pending writers.
6 Semaphores
<asm/semaphore.h><arch/xxx/kernel/semaphore.c>
• Checking (struct semaphore*)->count, dec&inc is spinlocked.
• when initial count > 1, it allows arbitrary number of lock
holders. when initial count = 1, it is binary semaphore, also
called mutex which is used in many places.
• It is sleeping locks.
• Threads may sleep while holding semaphores.
• Threads can't acquire semaphores while holding spin lock.
•
• down() threads get into uninterruptible state
• down_interruptible(), threads get into interruptible state
• up() inc count, if count<=0, wake up waiting thread
• ...
7 Reader-writer semaphores
<linux/rwsem.h>
• WE can understand it.
•
• down_read(), down_read_trylock()
• up_read()
• down_write(), down_write_trylock()
• up_write()
•
• NOTE: unlike rw-spinlock, we can downgrade from
writelock to readlock.
Spin locks VS. semaphores
(recommended)
• low overhead locking, spinlock
• short lock hold time , spinlock
• long lock hold time , semaphore
• for interrupt context use, spin lock
• sleep while holding lock, semaphore
8 Condition(Completion) Variables
<linux/completion.h><kernel/sched.c>
• It is a very simple solution to a problem that semaphore
could resolve otherwise. but maybe it is not wise to fix
semaphore.
• It just checks a condition to decide what to do: sleep(wake
up) or continue(null). sleeping+spinning==>cv
• It is mainly for SMP.
•
• only 2 functions:
• wait_for_completion() if ok, then continue, else wait.
• complete() signal any waiting threads.
Semaphore VS. Con.Varible
down()
lock; dec %0
...
spin_lock(sem->wait.lock)
//..., wait queue ops;
spin_unlock(sem->wait.lock)
up()
lock; inc %0
...
spin_lock(sem->wait.lock)
//..., wait queue ops;
spin_unlock(sem->wait.lock)
wait_for_completion()
spin_lock(cv->wait.lock)
//wait queue ops;
//may unlock spin and sleep
//dec cv->done
spin_unlock(cv->wait.lock)
complete()
spin_lock(cv->wait.lock)
//inc cv->done
//wait queue ops;
spin_unlock(cv->wait.lock)
complex and seperated locking simple and totally spinlocked
9 Sequence Locks
<linux/seqlock.h>
• For this situation: data has many readers and a few writers. like
RCU mechanism
• Unlike reader-writer locks, seqlock favors writers over readers.
• Readers never blocks, but have to retry for arbitray times if a
writer is in progress.
• Writers are mutually exclusive to change data, which is like spin
locks. But writers do not wait for readers.
do {
seq = read_seqbegin_xxx(seq);
// read data ...
} while (read_seqretry_xxx(seq))
write_seqlock_xxx();
// change data...
write_sequnlock_xxx();
Writers Readers
10 Mask interrupts(local and global)
<linux/interrupt.h><asm/system.h><kernel/irq/manage.c><asm/processor.h>
• Deal with CPU IF flag. which disable all interrupts of local
CPU (cli and sti instructions.)
• Masking PIC's irq line is another story. It makes serial
execution of same interrupt. but it could not prevent the
preemption from other interrupt.
• local_irq_disable(), local_irq_enable()
• Do you remember: spin_lock_irq() ? Disabling interrupts
are used with spin_lock().
•
• Global disabling: cruel! I don't know wheather removed. but
we can use synchronize_irq() to synchronize all CPUs.
11 Mask Bottom Halves
<linux/interrupt.h>
• when data is shared with bottom halves,
maybe we need to disable bottom halves.
•
• local_bh_disable(), local_bh_enable():
– calling add_preempt_count()
• spin_lock_bh()
12 Kernel Preemption Disable
<linux/preempt.h>
• preemption points:
– interrupt return path,
– arbitrary preemption points in kernel codes.
•
• preempt_disable() and preempt_enable()
preempt_disable();
int cpu = get_cpu();
// manipulating per_cpu(xxx, cpu);
// xxx is per_cpu data, such as runqueues.
preempt_enable
Thread 1, running on CPU 0
13 Read-Copy Updates
<linux/rcupdate.h>
• Best for read-mostly linked list(struct list_head).
• another Reader-Writer lock, but more complex and
advantaged.
• Reader will not block.
read grace period
(transition)
Transient state
(spec point)
update
write
Change new copy
(old copy)
create a copy
Big Kernel Lock: history
• Linux 2.0 - BKL about 1996 - SMP
• BSD/OS 4.x:
• FreeBSD 4.x: XXX – Giant (2000 -)
• goal : fine-grained locking
•
• Dragonfly BSD: forked from FreeBSD 4.x
• goal: lockless mem allocator and scheduling system
FUTEX
• Fast User Space Mutex
• It's for user-space threads synchronization.
• It's not a locking mechanism for kernel.
• It is implemented in kernel.
Relation of different locks implementations
atomics ops
mem barriers
spin lock
(rw)
semaphore
(rw)
con.variable
seq lockmask interrupts
preempt disable
disable_bh
RCU
simple complex
Locking in Linux kernel
• Kernel Control Paths
• Locking in Linux
• Locking and Coding
• Conclusions
Locking and Coding
• Is the data shared? Can other threads(contexts) access it?
• Is the data per-CPU’s? Can other CPUs access it? *
• Is the data shared between threads context and interrupt context? Is it
shared between two different interrupt handlers? …
• If a context is preempted while accessing this data, can the newly
scheduled context access the same data?
• Can the current context sleep on anything while accessing the data? If
it does, what state does that leave the shared data in?
• Does the data has special application? Keep in mind. *
• Now LET’S Continue CODING!
Locking and Coding
• Interrupt safe
• Preempt safe
• SMP safe
– (preempt safe ≦SMP safe)
Locking between various KCPs
• Exceptions..
• Interrupts..
• Bottom Halves..
• Kernel threads..
• System calls by user space threads..
1 between exception contexts
(UP:sleeping locks, SMP:+0)
• 1. Exception could not be caused in kernel. If any kernel
codes trigger an exception, this is a bug.
• 2. BUT page_fault and float-point registers exceptions
• 3. Exceptions could be caused by user-space codes.
• 4. According to 1st
item, exception contexts could not
trigger another exceptions, including page_fault and float-
point registers exceptions. But exception contexts could be
preempted by interrupts, and after interrupts return ,the
preempted exceptions continue on same CPU.
• 5 so we could conclude that sleeping locking are enough.
2 between interrupts contexts
(UP:mask local interrupts, SMP:+spinlock)
• Interrupts contexts have no kernel stack. It
could not sleep. Do not use sleeping locks.
• Same interrupt context runs serially on same
CPU because irq_desc->handler.ack() in do_IRQ() masks
the irq line. On UP, This situation is simple.
• Same or different interrupts could be
triggered on different CPUs, so SMP
requires spinlock to prevent race condition.
3 between Bottom Halves
(UP:null, SMP:+spinlock)
• Do not use old BH mechanism, it has poor performance and
has been removed in 2.6.
• Softirqs could not been preempted, except by interrupts. so
on UP, there is no race conditions.
• Bottom Halves could not sleep like interrupts for the same
reasons.
• Same or different softirqs could run on different CPUs.
• Tasklets are based on softirqs. Only different tasklets could
run on different CPUs.
• From above descriptions, we can conclude that on SMP
softirqs and different tasklets should be protected with
spinlocks, same tasklet could be used locklessly.
4 between exceptions and interrupts/bh
(UP: mask interrupts, SMP:+spinlocks)
• Interrupts could not be preempted by exceptions, if
this situation happens, this is a bug!
• So exceptions could disable interrupts to avoid
preemption by interrupts.
•
• bh is like interrupts, it is executed in interrupt
contexts.
• However, exceptions could use local_bh_disable()
to disable bottom halves.
5 between BottomHalves and interrupts
(UP: mask interrupts, SMP: spinlock)
• Bottom halves could use disabling interrupts
to avoid concurrency.
• for SMP, spinlock is necessary and enough.
6 between kernel threads and interrupts/bh
(UP: mask interrupts, SMP:+spinlock)
• Interrupts could preempt threads. so disable
interrupts to protect data used by threads.
• Because interrupts could not be preempted,
so we use spinlock.
7 between threads
(spinlock or sleeping lock)
• NOTE: in 2.6, spinlock automatically disabling
preemptions.
• what to use: spinlock or sleeping lock?
low overhead locking, spinlock
short lock hold time, spinlock
long lock hold time, semaphore
sleep while holding lock, semaphore
8 between system calls
(spinning lock or sleeping lock)
• This is same as between kernel threads.
Locking used between various KCPs
UP SMP+
exceptions -----------------------------sleepinglock null
interrupts ------------------------------mask interrupts spinlock
bottom halves -------------------------null spinlock or null
exceptions and interrupts/bh --------mask interrupts spinlock
bottom halves and interrupts -------mask interrupts spinlock
kernel threads and interrupts/bh ----mask interrupts spinlock
kernel threads ------------------------- sleeping or spin lock
system calls --------------------------- sleeping or spin lock
Kernel Configuration Tree and Debug
<make menuconfig>
• arch/xxx/Kconfig (mainmenu, <menu,endmenu>*)
– arch/xxx/Kconfig.debug
• lib/Kconfig.debug
– init/Kconfig
– fs/Kconfig.binfmt
– fs/Kconfig
– drivers/Kconfig.binfmt
– lib/Kconfig
– ...
• CONFIG_DEBUG_KERNEL
– CONFIG_DEBUG_SPINLOCK, CONFIG_SPINLOCK_SLEEP
– CONFIG_DEBUG_STACKOVERFLOW, CONFIG_4KSTACKS
– CONFIG_KDB(patches)
– ...
Locking in Linux kernel
• Kernel Control Paths
• Locking in Linux
• Locking and Coding
• Conclusions
Conclusions
• Locking or synchronization is a complex
problem, especially for large and/or complex
system.
• The problem caused by Locking in kernel is
not entirely predictive.
Locking: What is the problem?
• Implementing the actual locking in the code
to protect shared data is not hard.
• The tricky part is identifying the actual
shared data and corresponding critical
sections.
Cited from LKD, by R. Love
Locking: What is the problem?
• Deadlocks
• Priority Inversion
• Locking latency
• Locking: Coarse or fine-grained.
– Scalability VS. Overheads(performance).
– Not only Linux has the dilemma.
– Let’s keep close eyes at DragonflyBSD's progress
References
• Linux kernel source tree by Linus Torvalds and various
patches by hackers.
• Linux Kernel Development. by Robert Love.
• Understanding the Linux Kernel. by Daniel Bovet etc.
• www.freebsd.org/smp
• www.dragonflybsd.org
• .../kernel/Documents/*, google, gcc document...
• Pentium 4 software development document(3 volumes).
Thanks
• USTC BBS embedded board master: dj
• All the organizers and/or friends of the USTC 2005
developer workshop of embedded system.
• USTC Linux Users Group.
Happy Life, Happy Hacking.
THANKS

More Related Content

PPTX
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
PPT
Concurrency bug identification through kernel panic log (english)
PDF
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
PDF
Let's Talk Locks!
PDF
Prerequisite knowledge for shared memory concurrency
PDF
Kernel Recipes 2015: Introduction to Kernel Power Management
PDF
PHP at Density and Scale (Lone Star PHP 2014)
PPTX
The Silence of the Canaries
Dead Lock Analysis of spin_lock() in Linux Kernel (english)
Concurrency bug identification through kernel panic log (english)
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Let's Talk Locks!
Prerequisite knowledge for shared memory concurrency
Kernel Recipes 2015: Introduction to Kernel Power Management
PHP at Density and Scale (Lone Star PHP 2014)
The Silence of the Canaries

What's hot (20)

PDF
Kernel Recipes 2015 - Porting Linux to a new processor architecture
PPSX
LMAX Disruptor as real-life example
PDF
Xilkernel
PDF
MyShell - English
PDF
BlueHat v18 || Hardening hyper-v through offensive security research
PDF
Kernel crashdump
PPTX
Barcamp presentation
ODP
SystemV vs systemd
PDF
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
ODP
Linux kernel debugging(ODP format)
PDF
Kqueue : Generic Event notification
PDF
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
KEY
Lock? We don't need no stinkin' locks!
DOC
Linux synchronization tools
PDF
DOXLON November 2016: Facebook Engineering on cgroupv2
PDF
Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry code
PDF
Kernel Recipes 2013 - Kernel for your device
PDF
Linux Kernel Debugging Essentials workshop
PPTX
Threads and multi threading
PDF
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
Kernel Recipes 2015 - Porting Linux to a new processor architecture
LMAX Disruptor as real-life example
Xilkernel
MyShell - English
BlueHat v18 || Hardening hyper-v through offensive security research
Kernel crashdump
Barcamp presentation
SystemV vs systemd
Kernel Recipes 2014 - Writing Code: Keep It Short, Stupid!
Linux kernel debugging(ODP format)
Kqueue : Generic Event notification
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
Lock? We don't need no stinkin' locks!
Linux synchronization tools
DOXLON November 2016: Facebook Engineering on cgroupv2
Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry code
Kernel Recipes 2013 - Kernel for your device
Linux Kernel Debugging Essentials workshop
Threads and multi threading
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
Ad

Similar to Kernel (20)

PDF
Linux kernel development_ch9-10_20120410
PDF
Linux kernel development chapter 10
PDF
Kernel locking
PPT
Synchronization linux
PDF
Linux Locking Mechanisms
PDF
Describe synchronization techniques used by programmers who develop .pdf
PDF
AOS Lab 4: If you liked it, then you should have put a “lock” on it
PDF
spinlock.pdf
PPTX
Memory model
PPTX
interprocess communation and security in linux.pptx
PPSX
linux kernel overview 2013
PPTX
Io sy.stemppt
PPTX
28.Locks in operating systems engineering
PPTX
ubantu ppt.pptx
PPT
Threads Advance in System Administration with Linux
PDF
Memory Barriers in the Linux Kernel
PPT
Os4
PPTX
Operating Systems
PDF
Much Ado About Blocking: Wait/Wakke in the Linux Kernel
PPTX
9-Operating Systems -Synchronization, interprocess communication, deadlock.pptx
Linux kernel development_ch9-10_20120410
Linux kernel development chapter 10
Kernel locking
Synchronization linux
Linux Locking Mechanisms
Describe synchronization techniques used by programmers who develop .pdf
AOS Lab 4: If you liked it, then you should have put a “lock” on it
spinlock.pdf
Memory model
interprocess communation and security in linux.pptx
linux kernel overview 2013
Io sy.stemppt
28.Locks in operating systems engineering
ubantu ppt.pptx
Threads Advance in System Administration with Linux
Memory Barriers in the Linux Kernel
Os4
Operating Systems
Much Ado About Blocking: Wait/Wakke in the Linux Kernel
9-Operating Systems -Synchronization, interprocess communication, deadlock.pptx
Ad

Recently uploaded (20)

PPTX
MTVED - Trends in Food and Innovation.pptx
PPTX
The Schools Division Office of Davao del Sur humbly requests for the approval...
PPTX
Artificial intelligence introduction basic
PPTX
7. ANTI-FUNGAL DRUGS-PMY430123456789123.
PPTX
cctv.pptx paper presentation for school and college students
PPTX
Creative-Nonfiction-Demystified.pptxhhhh
PPTX
Nature and Scope of Political Science and its evolution
PPTX
MATERIALS IN ORTHODONTICS PART 1.pptxxxx
PDF
Surgical instruments for final year mbbs students
PDF
Understanding Indicators TA Means technical indicators
PPTX
QC & QA.pptx........,...................
PPTX
KAMAL HASSAN A VERY FAMOUS SOUTH INDIAN STAR.pptx
PPTX
obstetric instruments for final year mbbs students
PPTX
Session 4 of vibale oldin sink about vola
PPTX
Trafficking In Persons of Bangladesh.pptx
PDF
Lesson 1-IOM-Introduction to Management and Organizations.pdf
PPT
3. Aggregate.ppt he is the main things of
PPTX
CDI 2.pptx special crime investigation with legal medicine
PPTX
Reinforcement Learning All Modules and Chapters
PPTX
OJT-Narrative-Presentation-Entrep-group.pptx_20250808_102837_0000.pptx
MTVED - Trends in Food and Innovation.pptx
The Schools Division Office of Davao del Sur humbly requests for the approval...
Artificial intelligence introduction basic
7. ANTI-FUNGAL DRUGS-PMY430123456789123.
cctv.pptx paper presentation for school and college students
Creative-Nonfiction-Demystified.pptxhhhh
Nature and Scope of Political Science and its evolution
MATERIALS IN ORTHODONTICS PART 1.pptxxxx
Surgical instruments for final year mbbs students
Understanding Indicators TA Means technical indicators
QC & QA.pptx........,...................
KAMAL HASSAN A VERY FAMOUS SOUTH INDIAN STAR.pptx
obstetric instruments for final year mbbs students
Session 4 of vibale oldin sink about vola
Trafficking In Persons of Bangladesh.pptx
Lesson 1-IOM-Introduction to Management and Organizations.pdf
3. Aggregate.ppt he is the main things of
CDI 2.pptx special crime investigation with legal medicine
Reinforcement Learning All Modules and Chapters
OJT-Narrative-Presentation-Entrep-group.pptx_20250808_102837_0000.pptx

Kernel

  • 1. Locking in Linux kernel Galois @ USTC Linux Users Group [email protected] Slides are powered by OpenOffice.org+Linux+GNU+X31-150$
  • 2. Copyright © 2005 Galois Y.F. Zheng Permissions is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be downloaded from GNU's home: http://www.gnu.org/licenses/fdl.txt
  • 3. Locking in Linux kernel • OS review: Kernel Control Paths • Locking in Linux • Locking and Coding • Conclusions
  • 4. OS review: Kernel Control Paths • CPU is stupid, running endlessly! • But the codes in RAM is intelligent. (They compete for CPU resources.) • We people control the codes. cpu, operating system, and human
  • 5. 2 1 3 4 2nd floor 1 3 4 2 “CPU” running endlessly endless rotating elevator: CPU X : Codes 1 X : Codes 2 entrance exit 1st floor Human 2 31 4... : Kernel Control Path 1 1 2 3 4... : Kernel Control Path 2
  • 6. OS review: Kernel Control Paths • Interrupt handlers • Exception handlers • User-space threads in kernel(system calls) • Kernel threads(idle, work queue, pdflush…) • Bottom halves(soft irq, tasklet,BH...) Pre-KCP: What is Kernel Control Path ? Post-KCP: Is mm subsystem a KCP? NO, But mm codes are called by KCP.
  • 7. OS review: Kernel Control Paths • Kernel Control Paths(KCP). • Kernel Data (global or local). • Kernel Codes called by the KCPs. • Bootstrap Codes, Initialization Codes, ... What is the composition of a kernel? Now we need Locking (between KCPs), Let's GO!
  • 8. Locking in Linux kernel • OS review: Kernel Control Paths • Locking in Linux • Locking and Coding • Conclusions
  • 9. Locking in Linux What is Locking? A Simple example. KCP 1: load i; // i = 0 ... inc i; store i; // i = 1 KCP 2: load i; // i = 0 inc i; ... store i; // i = 1 The result is wrong because of accessing “i” at the same time. int i = 0 i = 1, wrong
  • 10. Locking in Linux What is Locking? A Simple example. (cont.) KCP 1: <locking starts> load i; // i = 0 inc i; store i; // i = 1 <locking ends> KCP 2: <locking starts>... failed. waiting.. waiting.. <locking succeeds> load i; //now i = 1 inc i; store i; // i = 2 <locking ends> int i = 0 i = 2, right! x86 lock directive.:)
  • 12. Heart has no locking: a disordered world. Give me your heart. Give me your heart. My heart will panic. ! !! !
  • 13. Locking: the world is well-ordered. Give me your heart. and lock it.. Give me your heart. My heart will go on with you Sigh... Locking failed Waiting... Sorry. I has been locked. xixi ... Locking succeeds.
  • 14. Locking: the world is well-ordered. Game over! Give me your heart. Now I can Lock you. now I'm ok. Locking releases.
  • 15. Now we know that locking makes the world romantic and beautiful.
  • 16. Locking in Linux • What is Locking. (cont.) – Shared Data – Does Code need locking ? - Yes, surprising! • Critical Regions – Concurrency caused by Kernel Control Paths • Race Conditions ? - Must be avoided. – Now we needs Synchronization . - Locking.
  • 17. Locking in Linux KCP 1 • Locking the queue • Succeeded: acquired lock • Access queue • Unlock the queue KCP 2 • Locking the queue • Failed: waiting… • Waiting… • … • Succeeded: acquired lock • Access queue • Unlock the queue. Cited from LKD by R. Love Concurrency and Locking : another example.
  • 18. Locking in Linux • Interrupts and Exceptions. • Sleeping and synchronization. • Kernel Preemption. • SMP ! - a hot topic, even in embedded apps. What Causes Concurrency?
  • 19. Locking in Linux Locking(Sync.) is important. Let's get into the Locking details.
  • 20. Locking in Linux • 1 Atomics operations • 2 Memory barriers • 3 Spin locks • 4 Reader-writer spin locks • 5 Semaphores • 6 Reader-writer semaphores Various Locking mechanisms.
  • 21. Locking in Linux – 7 Condition(Completion) Variables – 8 Sequence locks – 9 Mask Interrupts(local and global) – 10 Mask Bottom Halves – 11 Disable Kernel Preemption – 12 Read-Copy Update – – Big Kernel Lock - Historical, will be removed – FUTEX ? - NO Various Locking mechanisms. (cont.)
  • 22. 1 Atomics Operations • atomic ops is for the concurrency caused by MP. not for other concurrencies caused by preemption, sleep... • atomic operations mechanisms(SMP env): – cpu guaranteed atomic ops: read/write a byte, alined word... – lock prefix: add, adc, and, cmpxchg, cmpxch8b, dec, inc, neg, not, or, sbb, sub ,xor, xadd, btc,bts, btr – xchg is automatically added lock prefix. – cache coherency protocols.
  • 23. 1 Atomics Operations <asm/atomic.h> <asm/bitops.h> • atomic integer ops: (on atomic_t type v) – atomic_read(v) v->counter not necessary – atomic_set(v) v->counter not necessary – atomic_add(i,v) v->counter+i lock;addl %1,%0 – ... • atomic bitwise ops: – set_bit(i,addr) set the i-th bit lock;btsl %1,%0 – clear_bit(i,addr) clear the i-th bit lock;btrl %1,%0 – test_and_set_bit lock;btsl %2,%1;sbbl %1,%0 – ... • pseudo atomic bitwise ops: carefully! – __set_bit(), __xxx() ..... there is no lock prefix.
  • 24. 2 Memory Barriers basics • gcc optimizes instruction streams. • 386 is strong ordering, where read and write are issued on the system bus in the order they occur..but pentium 4 is processor ordering, by which cpu could improve performance. • memory barriers hardware technologies(x86): – serializing instructions • mov(to control register/debug register), wrmsr, invd, invlpg, wbinvd,lgdt,lldt,lidt,ltr; • cpuid,iret,rsm (non-previledged) • sfence(store), mfence(all), lfence(load) (non-preveledged) – io instructions, read/write to uncached memory, interrupt ocurrence, lock prefix – mtrr and pat could control memory ordering.
  • 25. 2 Memory Barriers Methods <asm/system.h> • rmb(), prevents loads being reordered • read_barrier_depends(), prevents data-dependent loads being reordered. • wmb(), prevents stores being reordered. • mb(), prevents loads and stores being reordered. • • barrier(), prevents GCC optimize loads and stores. • • smp_xxx(), on smp, provides xxx; on up provides barrier() Note: “xxx” refers to rmb, wmb...
  • 26. 3 Spin locks <linux/spinlock.h><asm/spinlock.h> • Spinning on SMP. Spinning is null on UP. • Don't hold it for a long time. less than context switch time. • spinlock automatically disables preemption, which avoids deadlock caused by interrupts. • when data is shared with interrupt handler, before holding spinlock we must disable interrupts. • when data is shared with bottom halves, before holding spinlock we must disable bottom halves.
  • 27. 4 Spin Locks (cont.) • spin_lock() acquire lock • spin_unlock() release lock • spin_lock_irq() disable local interrupts and acquire lock • spin_unlock_irq() • spin_lock_irqsave() save current state of ints, ... • spin_lock_irqrestore() restore.... • ...
  • 28. 5 Reader-writer spin locks <asm/spinlock.h><linux/spinlock.h> • Writing demands mutual exclusion. • Multiple concurrent Readings is ok. • When Reading, Writing must be disabled. • • Reading locks and writing locks are seperated. • read_lock_xxx() read_unlock_xxx() • write_lock_xxx() write_unlock_xxx() • ... ... • Problems: This locks favor readers over writers, which may starve pending writers.
  • 29. 6 Semaphores <asm/semaphore.h><arch/xxx/kernel/semaphore.c> • Checking (struct semaphore*)->count, dec&inc is spinlocked. • when initial count > 1, it allows arbitrary number of lock holders. when initial count = 1, it is binary semaphore, also called mutex which is used in many places. • It is sleeping locks. • Threads may sleep while holding semaphores. • Threads can't acquire semaphores while holding spin lock. • • down() threads get into uninterruptible state • down_interruptible(), threads get into interruptible state • up() inc count, if count<=0, wake up waiting thread • ...
  • 30. 7 Reader-writer semaphores <linux/rwsem.h> • WE can understand it. • • down_read(), down_read_trylock() • up_read() • down_write(), down_write_trylock() • up_write() • • NOTE: unlike rw-spinlock, we can downgrade from writelock to readlock.
  • 31. Spin locks VS. semaphores (recommended) • low overhead locking, spinlock • short lock hold time , spinlock • long lock hold time , semaphore • for interrupt context use, spin lock • sleep while holding lock, semaphore
  • 32. 8 Condition(Completion) Variables <linux/completion.h><kernel/sched.c> • It is a very simple solution to a problem that semaphore could resolve otherwise. but maybe it is not wise to fix semaphore. • It just checks a condition to decide what to do: sleep(wake up) or continue(null). sleeping+spinning==>cv • It is mainly for SMP. • • only 2 functions: • wait_for_completion() if ok, then continue, else wait. • complete() signal any waiting threads.
  • 33. Semaphore VS. Con.Varible down() lock; dec %0 ... spin_lock(sem->wait.lock) //..., wait queue ops; spin_unlock(sem->wait.lock) up() lock; inc %0 ... spin_lock(sem->wait.lock) //..., wait queue ops; spin_unlock(sem->wait.lock) wait_for_completion() spin_lock(cv->wait.lock) //wait queue ops; //may unlock spin and sleep //dec cv->done spin_unlock(cv->wait.lock) complete() spin_lock(cv->wait.lock) //inc cv->done //wait queue ops; spin_unlock(cv->wait.lock) complex and seperated locking simple and totally spinlocked
  • 34. 9 Sequence Locks <linux/seqlock.h> • For this situation: data has many readers and a few writers. like RCU mechanism • Unlike reader-writer locks, seqlock favors writers over readers. • Readers never blocks, but have to retry for arbitray times if a writer is in progress. • Writers are mutually exclusive to change data, which is like spin locks. But writers do not wait for readers. do { seq = read_seqbegin_xxx(seq); // read data ... } while (read_seqretry_xxx(seq)) write_seqlock_xxx(); // change data... write_sequnlock_xxx(); Writers Readers
  • 35. 10 Mask interrupts(local and global) <linux/interrupt.h><asm/system.h><kernel/irq/manage.c><asm/processor.h> • Deal with CPU IF flag. which disable all interrupts of local CPU (cli and sti instructions.) • Masking PIC's irq line is another story. It makes serial execution of same interrupt. but it could not prevent the preemption from other interrupt. • local_irq_disable(), local_irq_enable() • Do you remember: spin_lock_irq() ? Disabling interrupts are used with spin_lock(). • • Global disabling: cruel! I don't know wheather removed. but we can use synchronize_irq() to synchronize all CPUs.
  • 36. 11 Mask Bottom Halves <linux/interrupt.h> • when data is shared with bottom halves, maybe we need to disable bottom halves. • • local_bh_disable(), local_bh_enable(): – calling add_preempt_count() • spin_lock_bh()
  • 37. 12 Kernel Preemption Disable <linux/preempt.h> • preemption points: – interrupt return path, – arbitrary preemption points in kernel codes. • • preempt_disable() and preempt_enable() preempt_disable(); int cpu = get_cpu(); // manipulating per_cpu(xxx, cpu); // xxx is per_cpu data, such as runqueues. preempt_enable Thread 1, running on CPU 0
  • 38. 13 Read-Copy Updates <linux/rcupdate.h> • Best for read-mostly linked list(struct list_head). • another Reader-Writer lock, but more complex and advantaged. • Reader will not block. read grace period (transition) Transient state (spec point) update write Change new copy (old copy) create a copy
  • 39. Big Kernel Lock: history • Linux 2.0 - BKL about 1996 - SMP • BSD/OS 4.x: • FreeBSD 4.x: XXX – Giant (2000 -) • goal : fine-grained locking • • Dragonfly BSD: forked from FreeBSD 4.x • goal: lockless mem allocator and scheduling system
  • 40. FUTEX • Fast User Space Mutex • It's for user-space threads synchronization. • It's not a locking mechanism for kernel. • It is implemented in kernel.
  • 41. Relation of different locks implementations atomics ops mem barriers spin lock (rw) semaphore (rw) con.variable seq lockmask interrupts preempt disable disable_bh RCU simple complex
  • 42. Locking in Linux kernel • Kernel Control Paths • Locking in Linux • Locking and Coding • Conclusions
  • 43. Locking and Coding • Is the data shared? Can other threads(contexts) access it? • Is the data per-CPU’s? Can other CPUs access it? * • Is the data shared between threads context and interrupt context? Is it shared between two different interrupt handlers? … • If a context is preempted while accessing this data, can the newly scheduled context access the same data? • Can the current context sleep on anything while accessing the data? If it does, what state does that leave the shared data in? • Does the data has special application? Keep in mind. * • Now LET’S Continue CODING!
  • 44. Locking and Coding • Interrupt safe • Preempt safe • SMP safe – (preempt safe ≦SMP safe)
  • 45. Locking between various KCPs • Exceptions.. • Interrupts.. • Bottom Halves.. • Kernel threads.. • System calls by user space threads..
  • 46. 1 between exception contexts (UP:sleeping locks, SMP:+0) • 1. Exception could not be caused in kernel. If any kernel codes trigger an exception, this is a bug. • 2. BUT page_fault and float-point registers exceptions • 3. Exceptions could be caused by user-space codes. • 4. According to 1st item, exception contexts could not trigger another exceptions, including page_fault and float- point registers exceptions. But exception contexts could be preempted by interrupts, and after interrupts return ,the preempted exceptions continue on same CPU. • 5 so we could conclude that sleeping locking are enough.
  • 47. 2 between interrupts contexts (UP:mask local interrupts, SMP:+spinlock) • Interrupts contexts have no kernel stack. It could not sleep. Do not use sleeping locks. • Same interrupt context runs serially on same CPU because irq_desc->handler.ack() in do_IRQ() masks the irq line. On UP, This situation is simple. • Same or different interrupts could be triggered on different CPUs, so SMP requires spinlock to prevent race condition.
  • 48. 3 between Bottom Halves (UP:null, SMP:+spinlock) • Do not use old BH mechanism, it has poor performance and has been removed in 2.6. • Softirqs could not been preempted, except by interrupts. so on UP, there is no race conditions. • Bottom Halves could not sleep like interrupts for the same reasons. • Same or different softirqs could run on different CPUs. • Tasklets are based on softirqs. Only different tasklets could run on different CPUs. • From above descriptions, we can conclude that on SMP softirqs and different tasklets should be protected with spinlocks, same tasklet could be used locklessly.
  • 49. 4 between exceptions and interrupts/bh (UP: mask interrupts, SMP:+spinlocks) • Interrupts could not be preempted by exceptions, if this situation happens, this is a bug! • So exceptions could disable interrupts to avoid preemption by interrupts. • • bh is like interrupts, it is executed in interrupt contexts. • However, exceptions could use local_bh_disable() to disable bottom halves.
  • 50. 5 between BottomHalves and interrupts (UP: mask interrupts, SMP: spinlock) • Bottom halves could use disabling interrupts to avoid concurrency. • for SMP, spinlock is necessary and enough.
  • 51. 6 between kernel threads and interrupts/bh (UP: mask interrupts, SMP:+spinlock) • Interrupts could preempt threads. so disable interrupts to protect data used by threads. • Because interrupts could not be preempted, so we use spinlock.
  • 52. 7 between threads (spinlock or sleeping lock) • NOTE: in 2.6, spinlock automatically disabling preemptions. • what to use: spinlock or sleeping lock? low overhead locking, spinlock short lock hold time, spinlock long lock hold time, semaphore sleep while holding lock, semaphore
  • 53. 8 between system calls (spinning lock or sleeping lock) • This is same as between kernel threads.
  • 54. Locking used between various KCPs UP SMP+ exceptions -----------------------------sleepinglock null interrupts ------------------------------mask interrupts spinlock bottom halves -------------------------null spinlock or null exceptions and interrupts/bh --------mask interrupts spinlock bottom halves and interrupts -------mask interrupts spinlock kernel threads and interrupts/bh ----mask interrupts spinlock kernel threads ------------------------- sleeping or spin lock system calls --------------------------- sleeping or spin lock
  • 55. Kernel Configuration Tree and Debug <make menuconfig> • arch/xxx/Kconfig (mainmenu, <menu,endmenu>*) – arch/xxx/Kconfig.debug • lib/Kconfig.debug – init/Kconfig – fs/Kconfig.binfmt – fs/Kconfig – drivers/Kconfig.binfmt – lib/Kconfig – ... • CONFIG_DEBUG_KERNEL – CONFIG_DEBUG_SPINLOCK, CONFIG_SPINLOCK_SLEEP – CONFIG_DEBUG_STACKOVERFLOW, CONFIG_4KSTACKS – CONFIG_KDB(patches) – ...
  • 56. Locking in Linux kernel • Kernel Control Paths • Locking in Linux • Locking and Coding • Conclusions
  • 57. Conclusions • Locking or synchronization is a complex problem, especially for large and/or complex system. • The problem caused by Locking in kernel is not entirely predictive.
  • 58. Locking: What is the problem? • Implementing the actual locking in the code to protect shared data is not hard. • The tricky part is identifying the actual shared data and corresponding critical sections. Cited from LKD, by R. Love
  • 59. Locking: What is the problem? • Deadlocks • Priority Inversion • Locking latency • Locking: Coarse or fine-grained. – Scalability VS. Overheads(performance). – Not only Linux has the dilemma. – Let’s keep close eyes at DragonflyBSD's progress
  • 60. References • Linux kernel source tree by Linus Torvalds and various patches by hackers. • Linux Kernel Development. by Robert Love. • Understanding the Linux Kernel. by Daniel Bovet etc. • www.freebsd.org/smp • www.dragonflybsd.org • .../kernel/Documents/*, google, gcc document... • Pentium 4 software development document(3 volumes).
  • 61. Thanks • USTC BBS embedded board master: dj • All the organizers and/or friends of the USTC 2005 developer workshop of embedded system. • USTC Linux Users Group.
  • 62. Happy Life, Happy Hacking. THANKS