SlideShare a Scribd company logo
Linux Virtual File System Peter J. Braam
Aims Present the data structures in Linux VFS Provide information about flow of control Describe methods and invariants needed to implement a new file system Illustrate with some examples
History BSD implemented VFS for NFS: aim dispatch to different filesystems VMS had elaborate filesystem NT/Win95 have VFS type interfaces Newer systems integrate VM with buffer cache.   File access VFS ufs nfs Coda disk Venus udp
Linux Filesystems Media based ext2 - Linux native ufs - BSD fat - DOS FS vfat - win 95 hpfs - OS/2 minix - well…. Isofs - CDROM sysv - Sysv Unix hfs - Macintosh affs - Amiga Fast FS NTFS - NT’s FS adfs - Acorn-strongarm Network nfs Coda  AFS - Andrew FS smbfs - LanManager ncpfs - Novell Special ones procfs -/proc  umsdos - Unix in DOS userfs - redirector to user
Linux Filesystems (ctd) Forthcoming: devfs - device file system DFS - DCE distributed FS Varia: cfs - crypt filesystem cfs - cache filesystem ftpfs - ftp filesystem mailfs - mail filesystem pgfs - Postgres versioning file system Linux serves (unrelated to the VFS!) NFS - user & kernel Coda AppleShare - netatalk/CAP SMB - samba NCP - Novell
Linux is Obsolete Andrew Tanenbaum Usefulness
Linux VFS Multiple interfaces build up VFS: files dentries  inodes superblock  quota VFS can do all caching & provides utility fctns to FS  FS provides methods to VFS; many are optional File access ext2fs nfs Coda FS VFS VFS VFS disk udp Venus
User level file access Typical user level types and code: pathnames : “/myfile”  file descriptors : fd = open(“/myfile”…) attributes  in struct stat: stat(“/myfile”, &mybuf), chmod, chown... offsets : write,  read, lseek directory handles : DIR *dh = opendir(“/mydir”) directory  entries: struct dirent *ent = readdir(dh)
VFS  Manages kernel level file abstractions in one format for all file systems Receives system call requests from user level (e.g. write, open, stat, link) Interacts with a specific file system based on mount point traversal Receives requests from other parts of the kernel, mostly from memory management
File system level Individual File Systems responsible for managing  file & directory data responsible for managing  meta-data : timestamps, owners, protection etc translates data between particular FS data : e.g. disk data, NFS data, Coda/AFS data VFS data : attributes etc in standard format e.g. nfs_getattr(….) returns attributes in VFS format, acquires attributes in NFS format to do so.
Anatomy of  stat  system call sys_stat(path, buf) {  dentry = namei(path); if ( dentry == NULL ) return -ENOENT; inode = dentry->d_inode; rc =inode->i_op->i_permission(inode); if  ( rc ) return -EPERM; rc = inode->i_op->i_getattr(inode, buf); dput(dentry); return rc; } Establish VFS data Call into inode layer of filesystem Call into inode layer of filesystem
Anatomy of  fstatfs  system call sys_fstatfs(fd, buf) {  /* for things like “df” */ file = fget(fd); if ( file == NULL ) return -EBADF;  superb = file->f_dentry->d_inode->i_super; rc = superb->sb_op->sb_statfs(sb, buf); return rc; } Call into superblock layer of filesystem Translate fd to VFS data structure
Data structures VFS data structures for: VFS handle to the file:  inode  (BSD: vnode) User instantiated file handle:  file  (BSD: file) The whole filesystem:  superblock  (BSD: vfs) A name to inode translation:  dentry
Shorthand method notation super block methods: sss_methodname inode methods: iii_methodname dentry methods: ddd_methodname file methods: fff_methodname instead of :  inode  i_op  lookup we write iii_lookup
namei struct dentry *namei(parent, name) { if (dentry = d_lookup(parent,name)) else ddd_hash(parent, name) ddd_revalidate(dentry) iii_lookup(parent, name) sss_read_inode(…) struct inode *iget(ino, dev) { /* try cache else .. */ } VFS FS
Superblocks Handle  metadata  only (attributes etc) Responsible for retrieving and storing metadata from the FS media or peers Struct superblocks hold things like: device, blocksize, dirty flags, list of dirty inodes super operations wait queue pointer to the  root inode  of this FS
Super Operations (sss_) Ops on Inodes : read_inode put_inode write_inode delete_inode clear_inode notify_change Superblock  manips: read_super ( mount ) put_super ( unmount )  write_super ( unmount ) statfs (attributes)
Inodes Inodes are VFS abstraction for the file Inode has operations (iii_methods) VFS maintains an inode cache, NOT the individual FS’s (compare NT, BSD etc) Inodes contain an FS specific area where: ext2 stores disk block numbers etc AFS would store the FID Extraordinary inode ops are good for dealing with stale NFS file handles etc.
What’s inside an inode - 1 list_head i_hash list_head i_list list_head i_dentry int i_count long i_ino int i_dev {m,a,c}time {u,g}id mode size n_link caching Identifies file Usual stuff
What’s inside an inode -2  superblock i_sb inode_ops i_op wait objects, semaphore lock vm_area_struct pipe/socket info page information union { ext2fs_inode_info i_ext2 nfs_inode_info i_nfs coda_inode_info i_coda ..} u Which FS For mmap, networking waiting FS Specific info: blockno’s fids etc
Inode state Inode can be on one or two lists: ( hash & in_use )  or  ( hash & dirty  ) or  unused inode has a use count  i_count Transitions  unused     hash : iget calls sss_read_inode dirty    in_use : sss_write_inode hash      unused : call on sss_clear_inode, but if  i_nlink = 0: iput calls sss_delete_inode when i_count falls to 0
Inode Cache Dirty inodes Inode_hashtable 1. iget: if i_count>0 ++ 2. iput: if i_count>1 - - sss_write_inode (sync one) Used inodes Unused inodes sss_read_inode (iget) sss_clear_inode (freeing inos) or sss_delete_inode (iput) media fs only (mark_inode_dirty) 3. free_inodes 4. syncing inodes Players: Fs storage Fs storage Fs storage
Sales Red Hat Software sold 240,000 copies of Red Hat Linux in 1997 and expects to reach 400,000 in 1998. Estimates of installed servers (InfoWorld): - Linux: 7 million - OS/2: 5 million - Macintosh: 1 million
Inode operations (iii_) lookup: return inode calls iget creation/removal create link unlink symlink mkdir rmdir mknod rename symbolic links readlink follow link pages readpage, writepage, updatepage - read or write page. Generic for mediafs. bmap - return disk block number of logical block special operations revalidate - see dentry sect truncate permission
Dentry world Dentry is a name to inode translation structure Cached agressively by VFS Eliminates lookups by FS & private caches timing on Coda FS: ls -lR 1000 files after priming cache linux 2.0.32: 7.2secs linux 2.1.92: 0.6secs disk fs: less benefit, NFS even more Negative entries! Namei is dramatically simplified
Inside dentry’s name pointer to inode pointer to parent dentry list head of children chains for lots of lists use count
Dentry associated lists d_alias chains place : d_instantiate remove : dentry_iput inode I_dentry list head d_child chains place : d_alloc remove : d_prune, d_invalidate, d_put inode i_dentry list head = d_inode pointer = d_parent pointer dentry inode relationship dentry tree relationship Legend:  inode  dentry
Dcache namei  tries cache:  d_lookup ddd_compare Success:  ddd_revalidate d_invalidate  if fails proceed if success Failure:  iii_lookup find inode iget sss_read_inode finish: d_add can give negative entry in dcache dentry_hashtable  (d_hash chains) unused dentries  (d_lru chains) namei iii_lookup d_add prune d_invalidate d_drop dhash(parent, name) list head
Dentry methods ddd_revalidate: can force new lookup ddd_hash: compute hash value of name ddd_compare: are names equal? ddd_delete, ddd_put, ddd_iput: FS cleanup opportunity
Dentry particulars: ddd_hash and ddd_compare have to deal with extraordinary cases for msdos/vfat: case insensitive long and short filename pleasantries ddd_revalidate -- can force new lookup if inode not in use: used for NFS/SMBfs aging used for Coda/AFS callbacks
Style Dijkstra probably hates me   Linus Torvalds
Memory mapping vm_area structure has  vm_operations inode, addresses etc. vm_operations map, unmap swapin, swapout nopage -- read when page isn’t in VM mmap calls on iii_readpage keeps a use count on the inode until unmap

More Related Content

What's hot (20)

PPTX
Introduction to linux
Stephen Ahiante
 
PDF
Linux systems - Linux Commands and Shell Scripting
Emertxe Information Technologies Pvt Ltd
 
PPTX
Network and System Administration chapter 2
IgguuMuude
 
PDF
Linux Presentation
nishantsri
 
PDF
Page cache in Linux kernel
Adrian Huang
 
PDF
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Valeriy Kravchuk
 
PPTX
Kernel module in linux os.
MUKESH BADIGINENI
 
PPTX
Unix
Erm78
 
PDF
Linux Directory Structure
Kevin OBrien
 
PDF
Linux Basic Commands
Hanan Nmr
 
ODP
Linux Internals - Kernel/Core
Shay Cohen
 
PDF
Kernel Configuration and Compilation
Bud Siddhisena
 
PPTX
Unix ppt
sudhir saurav
 
ODP
Linux commands
Balakumaran Arunachalam
 
PDF
Linux kernel modules
Dheryta Jaisinghani
 
PPTX
Grep - A powerful search utility
Nirajan Pant
 
PDF
The Linux Kernel Implementation of Pipes and FIFOs
Divye Kapoor
 
PDF
Unix - An Introduction
Deepanshu Gahlaut
 
PPTX
Linux commands
penetration Tester
 
PDF
Users and groups in Linux
Knoldus Inc.
 
Introduction to linux
Stephen Ahiante
 
Linux systems - Linux Commands and Shell Scripting
Emertxe Information Technologies Pvt Ltd
 
Network and System Administration chapter 2
IgguuMuude
 
Linux Presentation
nishantsri
 
Page cache in Linux kernel
Adrian Huang
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Valeriy Kravchuk
 
Kernel module in linux os.
MUKESH BADIGINENI
 
Unix
Erm78
 
Linux Directory Structure
Kevin OBrien
 
Linux Basic Commands
Hanan Nmr
 
Linux Internals - Kernel/Core
Shay Cohen
 
Kernel Configuration and Compilation
Bud Siddhisena
 
Unix ppt
sudhir saurav
 
Linux commands
Balakumaran Arunachalam
 
Linux kernel modules
Dheryta Jaisinghani
 
Grep - A powerful search utility
Nirajan Pant
 
The Linux Kernel Implementation of Pipes and FIFOs
Divye Kapoor
 
Unix - An Introduction
Deepanshu Gahlaut
 
Linux commands
penetration Tester
 
Users and groups in Linux
Knoldus Inc.
 

Viewers also liked (20)

PPTX
Linux Vfs
Meiri Lerner
 
PDF
Virtual file system (VFS)
Waylin Ch
 
PDF
Part 03 File System Implementation in Linux
Tushar B Kute
 
PDF
Linux File System
Anil Kumar Pugalia
 
PDF
1 04 rao
dimitar9
 
PDF
File System Implementation - Part1
Amir Payberah
 
PPT
File system
Harleen Johal
 
PDF
File System Modules
Anil Kumar Pugalia
 
PDF
Linux Memory Management
Anil Kumar Pugalia
 
PPTX
Video maker fx
nobby70
 
PDF
Kernel development
Nuno Martins
 
PDF
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
Anne Nicolas
 
PPT
OSCh12
Joe Christensen
 
ODP
File system
kendersec
 
PPT
Unit 4
siddr
 
PDF
Linux Process Management Workshop
VIT University
 
PPT
Scheduling In Linux
Marcello Missiroli
 
PDF
Linux fundamental - Chap 15 Job Scheduling
Kenny (netman)
 
ODP
4. linux file systems
Marian Marinov
 
PDF
Linux scheduler
Liran Ben Haim
 
Linux Vfs
Meiri Lerner
 
Virtual file system (VFS)
Waylin Ch
 
Part 03 File System Implementation in Linux
Tushar B Kute
 
Linux File System
Anil Kumar Pugalia
 
1 04 rao
dimitar9
 
File System Implementation - Part1
Amir Payberah
 
File system
Harleen Johal
 
File System Modules
Anil Kumar Pugalia
 
Linux Memory Management
Anil Kumar Pugalia
 
Video maker fx
nobby70
 
Kernel development
Nuno Martins
 
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
Anne Nicolas
 
File system
kendersec
 
Unit 4
siddr
 
Linux Process Management Workshop
VIT University
 
Scheduling In Linux
Marcello Missiroli
 
Linux fundamental - Chap 15 Job Scheduling
Kenny (netman)
 
4. linux file systems
Marian Marinov
 
Linux scheduler
Liran Ben Haim
 
Ad

Similar to Vfs (20)

PPT
Linux
keydak11
 
PPT
Linux filesystemhierarchy
Dr. C.V. Suresh Babu
 
PPT
Building File Systems with FUSE
elliando dias
 
PPT
UNIX(Essential needs of administration)
Papu Kumar
 
PDF
Glusterfs session #2 1 layer above disk filesystems
Pranith Karampuri
 
PPTX
Confraria Security & IT - Lisbon Set 29, 2011
ricardomcm
 
PPTX
Unix Administration
Nishant Munjal
 
PDF
File Systems
Anil Kumar Pugalia
 
PPT
Tutorial 2
tech2click
 
PDF
Lect12
Vin Voro
 
PDF
Rhel 6.2 complete ebook
Yash Gulati
 
DOCX
Rhel 6.2 complete ebook
Yash Gulati
 
PPT
Linux
sravan kumar
 
PPT
Andresen 8 21 02
FNian
 
PPT
Common linux ubuntu commands overview
Ameer Sameer
 
PPT
Unix file systems 2 in unix internal systems
senthilamul
 
PPTX
File management
Mohammed Sikander
 
PDF
Linux Interview Questions and Answers.pdf
SudhanshiBakre1
 
PDF
Unix commands in etl testing
Garuda Trainings
 
PDF
Unixcommandsinetltesting 130712050932-phpapp01
Gyanendra Kumar
 
Linux
keydak11
 
Linux filesystemhierarchy
Dr. C.V. Suresh Babu
 
Building File Systems with FUSE
elliando dias
 
UNIX(Essential needs of administration)
Papu Kumar
 
Glusterfs session #2 1 layer above disk filesystems
Pranith Karampuri
 
Confraria Security & IT - Lisbon Set 29, 2011
ricardomcm
 
Unix Administration
Nishant Munjal
 
File Systems
Anil Kumar Pugalia
 
Tutorial 2
tech2click
 
Lect12
Vin Voro
 
Rhel 6.2 complete ebook
Yash Gulati
 
Rhel 6.2 complete ebook
Yash Gulati
 
Andresen 8 21 02
FNian
 
Common linux ubuntu commands overview
Ameer Sameer
 
Unix file systems 2 in unix internal systems
senthilamul
 
File management
Mohammed Sikander
 
Linux Interview Questions and Answers.pdf
SudhanshiBakre1
 
Unix commands in etl testing
Garuda Trainings
 
Unixcommandsinetltesting 130712050932-phpapp01
Gyanendra Kumar
 
Ad

More from Waqas !!!! (20)

PPT
Fiber
Waqas !!!!
 
PPT
Distributedapplications
Waqas !!!!
 
PPT
Dictributed application by Waqas
Waqas !!!!
 
PPT
Congestionin Data Networks
Waqas !!!!
 
PPT
Circuit Packet
Waqas !!!!
 
PPT
Chap24
Waqas !!!!
 
PPT
Ad Hoc
Waqas !!!!
 
PPT
10 Circuit Packet
Waqas !!!!
 
PPT
Nfs
Waqas !!!!
 
PPT
Nfs1
Waqas !!!!
 
PPT
Nf Sp4
Waqas !!!!
 
PDF
Thesis11
Waqas !!!!
 
PPT
Satellites
Waqas !!!!
 
PPT
Lecture3 Physical Layer
Waqas !!!!
 
PPT
Communications
Waqas !!!!
 
PPT
Ch5
Waqas !!!!
 
PPT
Sad Lec3
Waqas !!!!
 
PPT
Lect2
Waqas !!!!
 
PPT
Object oriented programming by Waqas
Waqas !!!!
 
PPT
Chapter01 1
Waqas !!!!
 
Fiber
Waqas !!!!
 
Distributedapplications
Waqas !!!!
 
Dictributed application by Waqas
Waqas !!!!
 
Congestionin Data Networks
Waqas !!!!
 
Circuit Packet
Waqas !!!!
 
Chap24
Waqas !!!!
 
Ad Hoc
Waqas !!!!
 
10 Circuit Packet
Waqas !!!!
 
Nf Sp4
Waqas !!!!
 
Thesis11
Waqas !!!!
 
Satellites
Waqas !!!!
 
Lecture3 Physical Layer
Waqas !!!!
 
Communications
Waqas !!!!
 
Sad Lec3
Waqas !!!!
 
Lect2
Waqas !!!!
 
Object oriented programming by Waqas
Waqas !!!!
 
Chapter01 1
Waqas !!!!
 

Recently uploaded (20)

PPTX
Introduction to Probability(basic) .pptx
purohitanuj034
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Cybersecurity: How to Protect your Digital World from Hackers
vaidikpanda4
 
PPTX
Electrophysiology_of_Heart. Electrophysiology studies in Cardiovascular syste...
Rajshri Ghogare
 
PPTX
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
PPTX
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
PPTX
Translation_ Definition, Scope & Historical Development.pptx
DhatriParmar
 
PPTX
LDP-2 UNIT 4 Presentation for practical.pptx
abhaypanchal2525
 
PPTX
The Future of Artificial Intelligence Opportunities and Risks Ahead
vaghelajayendra784
 
PPTX
K-Circle-Weekly-Quiz12121212-May2025.pptx
Pankaj Rodey
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PPTX
YSPH VMOC Special Report - Measles Outbreak Southwest US 7-20-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
PPT
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
PDF
EXCRETION-STRUCTURE OF NEPHRON,URINE FORMATION
raviralanaresh2
 
PDF
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PPTX
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
PDF
John Keats introduction and list of his important works
vatsalacpr
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
Introduction to Probability(basic) .pptx
purohitanuj034
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
Cybersecurity: How to Protect your Digital World from Hackers
vaidikpanda4
 
Electrophysiology_of_Heart. Electrophysiology studies in Cardiovascular syste...
Rajshri Ghogare
 
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
Translation_ Definition, Scope & Historical Development.pptx
DhatriParmar
 
LDP-2 UNIT 4 Presentation for practical.pptx
abhaypanchal2525
 
The Future of Artificial Intelligence Opportunities and Risks Ahead
vaghelajayendra784
 
K-Circle-Weekly-Quiz12121212-May2025.pptx
Pankaj Rodey
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 7-20-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
EXCRETION-STRUCTURE OF NEPHRON,URINE FORMATION
raviralanaresh2
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
John Keats introduction and list of his important works
vatsalacpr
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 

Vfs

  • 1. Linux Virtual File System Peter J. Braam
  • 2. Aims Present the data structures in Linux VFS Provide information about flow of control Describe methods and invariants needed to implement a new file system Illustrate with some examples
  • 3. History BSD implemented VFS for NFS: aim dispatch to different filesystems VMS had elaborate filesystem NT/Win95 have VFS type interfaces Newer systems integrate VM with buffer cache. File access VFS ufs nfs Coda disk Venus udp
  • 4. Linux Filesystems Media based ext2 - Linux native ufs - BSD fat - DOS FS vfat - win 95 hpfs - OS/2 minix - well…. Isofs - CDROM sysv - Sysv Unix hfs - Macintosh affs - Amiga Fast FS NTFS - NT’s FS adfs - Acorn-strongarm Network nfs Coda AFS - Andrew FS smbfs - LanManager ncpfs - Novell Special ones procfs -/proc umsdos - Unix in DOS userfs - redirector to user
  • 5. Linux Filesystems (ctd) Forthcoming: devfs - device file system DFS - DCE distributed FS Varia: cfs - crypt filesystem cfs - cache filesystem ftpfs - ftp filesystem mailfs - mail filesystem pgfs - Postgres versioning file system Linux serves (unrelated to the VFS!) NFS - user & kernel Coda AppleShare - netatalk/CAP SMB - samba NCP - Novell
  • 6. Linux is Obsolete Andrew Tanenbaum Usefulness
  • 7. Linux VFS Multiple interfaces build up VFS: files dentries inodes superblock quota VFS can do all caching & provides utility fctns to FS FS provides methods to VFS; many are optional File access ext2fs nfs Coda FS VFS VFS VFS disk udp Venus
  • 8. User level file access Typical user level types and code: pathnames : “/myfile” file descriptors : fd = open(“/myfile”…) attributes in struct stat: stat(“/myfile”, &mybuf), chmod, chown... offsets : write, read, lseek directory handles : DIR *dh = opendir(“/mydir”) directory entries: struct dirent *ent = readdir(dh)
  • 9. VFS Manages kernel level file abstractions in one format for all file systems Receives system call requests from user level (e.g. write, open, stat, link) Interacts with a specific file system based on mount point traversal Receives requests from other parts of the kernel, mostly from memory management
  • 10. File system level Individual File Systems responsible for managing file & directory data responsible for managing meta-data : timestamps, owners, protection etc translates data between particular FS data : e.g. disk data, NFS data, Coda/AFS data VFS data : attributes etc in standard format e.g. nfs_getattr(….) returns attributes in VFS format, acquires attributes in NFS format to do so.
  • 11. Anatomy of stat system call sys_stat(path, buf) { dentry = namei(path); if ( dentry == NULL ) return -ENOENT; inode = dentry->d_inode; rc =inode->i_op->i_permission(inode); if ( rc ) return -EPERM; rc = inode->i_op->i_getattr(inode, buf); dput(dentry); return rc; } Establish VFS data Call into inode layer of filesystem Call into inode layer of filesystem
  • 12. Anatomy of fstatfs system call sys_fstatfs(fd, buf) { /* for things like “df” */ file = fget(fd); if ( file == NULL ) return -EBADF; superb = file->f_dentry->d_inode->i_super; rc = superb->sb_op->sb_statfs(sb, buf); return rc; } Call into superblock layer of filesystem Translate fd to VFS data structure
  • 13. Data structures VFS data structures for: VFS handle to the file: inode (BSD: vnode) User instantiated file handle: file (BSD: file) The whole filesystem: superblock (BSD: vfs) A name to inode translation: dentry
  • 14. Shorthand method notation super block methods: sss_methodname inode methods: iii_methodname dentry methods: ddd_methodname file methods: fff_methodname instead of : inode i_op lookup we write iii_lookup
  • 15. namei struct dentry *namei(parent, name) { if (dentry = d_lookup(parent,name)) else ddd_hash(parent, name) ddd_revalidate(dentry) iii_lookup(parent, name) sss_read_inode(…) struct inode *iget(ino, dev) { /* try cache else .. */ } VFS FS
  • 16. Superblocks Handle metadata only (attributes etc) Responsible for retrieving and storing metadata from the FS media or peers Struct superblocks hold things like: device, blocksize, dirty flags, list of dirty inodes super operations wait queue pointer to the root inode of this FS
  • 17. Super Operations (sss_) Ops on Inodes : read_inode put_inode write_inode delete_inode clear_inode notify_change Superblock manips: read_super ( mount ) put_super ( unmount ) write_super ( unmount ) statfs (attributes)
  • 18. Inodes Inodes are VFS abstraction for the file Inode has operations (iii_methods) VFS maintains an inode cache, NOT the individual FS’s (compare NT, BSD etc) Inodes contain an FS specific area where: ext2 stores disk block numbers etc AFS would store the FID Extraordinary inode ops are good for dealing with stale NFS file handles etc.
  • 19. What’s inside an inode - 1 list_head i_hash list_head i_list list_head i_dentry int i_count long i_ino int i_dev {m,a,c}time {u,g}id mode size n_link caching Identifies file Usual stuff
  • 20. What’s inside an inode -2 superblock i_sb inode_ops i_op wait objects, semaphore lock vm_area_struct pipe/socket info page information union { ext2fs_inode_info i_ext2 nfs_inode_info i_nfs coda_inode_info i_coda ..} u Which FS For mmap, networking waiting FS Specific info: blockno’s fids etc
  • 21. Inode state Inode can be on one or two lists: ( hash & in_use ) or ( hash & dirty ) or unused inode has a use count i_count Transitions unused  hash : iget calls sss_read_inode dirty  in_use : sss_write_inode hash  unused : call on sss_clear_inode, but if i_nlink = 0: iput calls sss_delete_inode when i_count falls to 0
  • 22. Inode Cache Dirty inodes Inode_hashtable 1. iget: if i_count>0 ++ 2. iput: if i_count>1 - - sss_write_inode (sync one) Used inodes Unused inodes sss_read_inode (iget) sss_clear_inode (freeing inos) or sss_delete_inode (iput) media fs only (mark_inode_dirty) 3. free_inodes 4. syncing inodes Players: Fs storage Fs storage Fs storage
  • 23. Sales Red Hat Software sold 240,000 copies of Red Hat Linux in 1997 and expects to reach 400,000 in 1998. Estimates of installed servers (InfoWorld): - Linux: 7 million - OS/2: 5 million - Macintosh: 1 million
  • 24. Inode operations (iii_) lookup: return inode calls iget creation/removal create link unlink symlink mkdir rmdir mknod rename symbolic links readlink follow link pages readpage, writepage, updatepage - read or write page. Generic for mediafs. bmap - return disk block number of logical block special operations revalidate - see dentry sect truncate permission
  • 25. Dentry world Dentry is a name to inode translation structure Cached agressively by VFS Eliminates lookups by FS & private caches timing on Coda FS: ls -lR 1000 files after priming cache linux 2.0.32: 7.2secs linux 2.1.92: 0.6secs disk fs: less benefit, NFS even more Negative entries! Namei is dramatically simplified
  • 26. Inside dentry’s name pointer to inode pointer to parent dentry list head of children chains for lots of lists use count
  • 27. Dentry associated lists d_alias chains place : d_instantiate remove : dentry_iput inode I_dentry list head d_child chains place : d_alloc remove : d_prune, d_invalidate, d_put inode i_dentry list head = d_inode pointer = d_parent pointer dentry inode relationship dentry tree relationship Legend: inode dentry
  • 28. Dcache namei tries cache: d_lookup ddd_compare Success: ddd_revalidate d_invalidate if fails proceed if success Failure: iii_lookup find inode iget sss_read_inode finish: d_add can give negative entry in dcache dentry_hashtable (d_hash chains) unused dentries (d_lru chains) namei iii_lookup d_add prune d_invalidate d_drop dhash(parent, name) list head
  • 29. Dentry methods ddd_revalidate: can force new lookup ddd_hash: compute hash value of name ddd_compare: are names equal? ddd_delete, ddd_put, ddd_iput: FS cleanup opportunity
  • 30. Dentry particulars: ddd_hash and ddd_compare have to deal with extraordinary cases for msdos/vfat: case insensitive long and short filename pleasantries ddd_revalidate -- can force new lookup if inode not in use: used for NFS/SMBfs aging used for Coda/AFS callbacks
  • 31. Style Dijkstra probably hates me Linus Torvalds
  • 32. Memory mapping vm_area structure has vm_operations inode, addresses etc. vm_operations map, unmap swapin, swapout nopage -- read when page isn’t in VM mmap calls on iii_readpage keeps a use count on the inode until unmap