SlideShare a Scribd company logo
LVM
It's only logical.
Steven Lembark
Workhorse Computing
lembark@wrkhors.com
What's so logical about LVM?
What's so logical about LVM?
Simple: It isn't phyiscal.
What's so logical about LVM?
Simple: It isn't phyiscal.
Think of it as "virtual memory" on a disk.
Simple???
PV
VG
LV
thin
provisioned
mirrored
snapshots
Begin at the beginning...
Disk drives were 5, maybe 10Mb.
Booting from tape takes too long.
Can't afford a whole disk for swap.
What to do?
Tar overlays?
RA-70 packs?
Partitions save the day!
Divide the drive for swap, boot, O/S.
Allows separate mount points.
New partitions == New mount points.
What you used to do
Partition the drive.
Remembering to keep a separate partition for /boot.
Using parted once the original layout was outgrown.
Figuring out how to split space with new disks...
Size matters.
Say you have something big: 20MB of data.
Tape overlays take too long.
RA-70's require remounts.
How can we manage it?
Making it bigger
We need an abstraction:
Vnodes.
Instead of "hardware".
Veritas & HP
Developed different ways to do this.
Physical drives.
Grouped into "blocks".
Allocated into "volumes".
Fortunately linux uses HP's scheme.
First few steps
pvcreate initialize physical storage.
whole disk or partition.
vgcreate multiple drives into pool of blocks.
lvcreate re-partition blocks into mountable units.
Example: single-disk desktop
grub2 speaks lvm – goodby boot partitions!
Two partitions: primary swap + everything else.
Call them /dev/sda{1,2}.
swap== 2 * RAM
rest == lvm
Example: single-disk desktop
grub2 speaks lvm – goodby boot partitions!
Two partitions: primary swap + everything else.
Call them /dev/sda{1,2}.
swap is for hibernate and recovery.
otherwise use LVM.
Example: single-disk desktop
# fdisk /dev/sda; # sda1 => 82, sda2 => 8e
Example: single-disk desktop
# fdisk /dev/sda; # sda1 => 82, sda2 => 8e
# pvcreate /dev/sda2;
Example: single-disk desktop
# fdisk /dev/sda; # sda1 => 82, sda2 => 8e
# pvcreate /dev/sda2;
# vgcreate vg00 /dev/sda2;
Example: single-disk desktop
# fdisk /dev/sda; # sda1 => 82, sda2 => 8e
# pvcreate /dev/sda2;
# vgcreate vg00 /dev/sda2;
# lvcreate -L 8Gi -n root vg00;
Example: single-disk desktop
# fdisk /dev/sda; # sda1 => 82, sda2 => 8e
# pvcreate /dev/sda2;
# vgcreate vg00 /dev/sda2;
# lvcreate -L 8Gi -n root vg00;
# mkfs.xfs -blog=12 -L root /dev/vg00/root;
Example: single-disk desktop
# fdisk /dev/sda; # sda1 => 82, sda2 => 8e
# pvcreate /dev/sda2;
# vgcreate vg00 /dev/sda2;
# lvcreate -L 8Gi -n root vg00;
# mkfs.xfs -blog=12 -L root /dev/vg00/root;
# mount /dev/vg00/root /mnt/gentoo;
Finding yourself
Ever get sick of UUID?
Labels?
Device paths?
Finding yourself
Ever get sick of UUID?
Labels?
Device paths?
LVM assigns UUIDs to PV, VG, LV.
Finding yourself
Ever get sick of UUID?
Labels?
Device paths?
Let LVM do the walking: vgscan -v
Give linux the boot
mount -t proc none /proc;
mount -t sysfs none /sys;
/sbin/mdadm --verbose --assemble --scan;
/sbin/vgscan –verbose;
/sbin/vgchange -a y;
/sbin/mount /dev/vg00/root /mnt/root;
exec /sbin/switch_root /mnt/root /sbin/init;
Then root fills up...
Say goodby to parted.
lvextend -L12Gi /dev/vg00/root;
xfs_growfs /dev/vg00/root;
Notice the lack of any umount.
Add a new disk
/sbin/fdisk /dev/sdb; # sdb1 => 8e
pvcreate /dev/sdb1;
vgextend vg00 /dev/sdb1;
lvextend -L24Gi /dev/vg00/root;
xfs_growfs /dev/vg00/root;
And another disk, and another...
Let's say you've scrounged ten disks.
One large VG.
And another disk, and another...
Let's say you've scrounged ten disks.
One large VG.
Then one disk fails.
And another disk, and another...
Let's say you've scrounged ten disks.
One large VG.
Then one disk fails.
And the entire VG with it.
Adding volume groups
Lesson: Over-large VG's become fragile.
Fix: Multiple VG's partition the vulnerability.
One disk won't bring down everyhing.
Growing a new VG
Plug in and fdisk your new device.
# pvcreate /dev/sdc1;
# vgcreate vg01 /dev/sdc1;
# lvcreate -n home vg01;
# mkfs.xfs -blog=12 -L home /dev/vg01/home;
Copy files, add /dev/vg01/home to /etc/fstab.
Your backups just got easier
Separate mount points for "scratch".
"find . -xdev"
Mount /var/spool, /var/tmp.
Back up persistent portion of /var with rest of root.
More smaller volumes
More /etc/fstab entries.
Isolate disk failures to non-essential data.
Back up by mount point.
Use different filesystems (xfs vs. ext4).
RAID + LVM
- LV with copies using LVM.
- Or make PV's out of mdadm volumes.
LV's simplify handling huge RAID volumes.
LVM RAID
# lvcreate -m <#copies> …
Automatically duplicates LV data.
"-m 2" == three-volume RAID (1 data + 2 copy).
Blocks don't need to be contiguous.
Can lvextend later on.
LVM on RAID
Division of labor:
- mdadm for RAID.
- LVM for mount points.
Use mdadm to create space.
Use LVM to manage it.
"Stride" for RAID > 1
LV blocks == RAID page size.
Good for RAID 5, 6, 10.
Meaningless for mirror (LVM or hardware).
Monitored LVM
# lvmcreate –monitor y ...
Use dmeventd for monitoring.
Know about I/O errors.
What else would you want to do at 3am?
Real SysAdmin's don't need sleep!
Archiving many GB takes time.
Need stable filesystems.
Easy: Kick everyone off, run the backups at 0300.
Real SysAdmin's don't need sleep!
Archiving many GB takes time.
Need stable filesystems.
Easy: Kick everyone off, run the backups at 0300.
If you enjoy that sort of thing.
Snapshots: mounted backups
Not a hot copy.
Snapshot pool == stable version of COW blocks.
Stable version of LV being backed up.
Size == max pages that might change during lifetime.
Lifecycle of a snapshot
Find mountpoint.
Snapshot mount point.
Work with static contents.
Drop snapshot.
Most common: backup
Live database
Spool directory
Disk cache
Home dirs
Backup a database
Data and config under /var/postgres.
Data on single LV /dev/vg01/postgres.
At 0300 up to 1GB per hour.
Daily backup takes about 30 minutes.
VG keeps 8GB free for snapshots.
Backup a database
# lvcreate -L1G -s -n pg-tmp 
/dev/vg01/postgres;
1G == twice the usual amount.
Updates to /dev/vg01/postgres continue.
Original pages stored in /dev/vg01/pg-backup.
I/O error in pg-backup if > 1GB written.
Backup a database
# lvcreate -L1G -s -n pg-tmp 
/dev/vg01/postgres;
# mount --type xfs 
-o'ro,norecovery,nouuid' /dev/vg01/pg-tmp
/mnt/backup;
# find /mnt/backup -xdev …
/mnt/backup is stable for duration of backup.
/var/postgres keeps writing.
Backup a database
One downside: Duplicate running database.
Takes extra steps to restart.
Not difficult.
Be Prepared.
Giving away what you ain't got
"Thin volumes".
Like sparse files.
Pre-allocate pool of blocks.
LV's grow as needed.
Allows overcommit.
"Thin"?
"Thick" == allocate LV blocks at creation time.
"—thin" assigns virtual size.
Physical size varies with use.
Why bother?
Filesystems that change over time.
No need to pre-allocate all of the space.
Add physical storage as needed.
ETL intake.
Archival.
User scratch.
Example: Scratch space for users.
Say you have ~30GB of free space.
And three users.
Each "needs" 40GB of space.
No problem.
The pool is an LV.
"--thinpool" labels the LV as a pool.
Allocate 30GB into /dev/vg00/scatch.
# lvcreate -L 30Gi --thinpool scratch vg00;
Virtually allocate LV
"lvcreate -V" allocates space out of the pool.
"-V" == "virtual"
# lvcreate -V 40Gi --thin -n thin_one 
/dev/vg00/scratch;
# lvcreate -V 40Gi ...
Virtually yours
Allocated 120Gi using 30GB of disk.
lvdisplay shows 0 used for each volume??
Virtually yours
Allocated 60GiB of 50GB.
lvdisplay shows 0 used for each volume??
Right: None used. Yet.
40GiB is a limit.
Pure magic!
Make a filesytem.
Mount the lvols.
df shows them as 40GiB.
Everyone is happy!
Pure magic!
Make a filesytem.
Mount the lvols.
df shows them as 20GiB.
Everyone is happy...
Until 30GB is used up.
Size does matter.
No blocks left to allocate.
Now what?
Writing procs are "killable blocked".
Hold queue until "kill -KILL" or space available.
One fix: scrounge a disk
vgextend vg00;
lvextend -L<whatever> /dev/vg00/scratch;
Bingo: free blocks.
Reduce, reuse, recycle
fstrim(8) removed unused blocks from a filesystem.
Reduces virtual allocations.
Allows virtual volumes to re-grow:
ftrim --all -verbose;
cron is your friend.
Highly dynamic environment
Weekly doesn't cut it:
download directory.
uncompress space
compile large projects.
Or a notebook with small SSD.
Automatic real-time trimming
3.0+ kernel w/ "TRIM".
Device needs "FITRIM".
http://xfs.org/index.php/FITRIM/discard
Sanity check: discard avaiable?
$ cat /sys/block/sda/queue/discard_max_bytes;
2147450880
$ cat /sys/block/dm-8/queue/discard_max_bytes;
2147450880
So far so good...
Do the deed
/dev/vg00/scratch_one /scratch/jowbloe 
xfs defaults,discard 0 1
or
mount -o defaults,discard /foo /mnt/foo;
What you see is all you get
Dynamic discard == overhead.
Often not worse than network mounts.
YMMV...
Avoids "just in case" provisioning.
Good with SSD, battery-backed RAID.
LVM benefits
- Saner mount points.
- Hot backups.
- Dynamic space manglement.
LVM benefits
- Saner mount points.
- Hot backups.
- Dynamic space manglement.
Seems logical?

More Related Content

What's hot (20)

PDF
Make container without_docker_7
Sam Kim
 
PDF
도커 없이 컨테이너 만들기 5편 마운트 네임스페이스와 오버레이 파일시스템
Sam Kim
 
PDF
Make container without_docker_6-overlay-network_1
Sam Kim
 
PDF
CoreOSによるDockerコンテナのクラスタリング
Yuji ODA
 
PDF
Configuration Surgery with Augeas
Puppet
 
KEY
Making Your Capistrano Recipe Book
Tim Riley
 
PPTX
Lessons from running potentially malicious code inside Docker containers
Ben Hall
 
PPTX
Continuous delivery with docker
Johan Janssen
 
PDF
도커 없이 컨테이너 만들기 4편 네트워크네임스페이스 (2)
Sam Kim
 
PDF
Docker in everyday development
Justyna Ilczuk
 
PDF
Linux Containers From Scratch: Makfile MicroVPS
joshuasoundcloud
 
PDF
Docker - container and lightweight virtualization
Sim Janghoon
 
DOCX
Using cgroups in docker container
Vinay Jindal
 
PDF
Django로 만든 웹 애플리케이션 도커라이징하기 + 도커 컴포즈로 개발 환경 구축하기
raccoony
 
PDF
Docker
Brian Hogan
 
PDF
Check the version with fixes. Link in description
Przemyslaw Koltermann
 
ODP
An example Hadoop Install
Mike Frampton
 
PDF
Puppet Camp Chicago 2014: Docker and Puppet: 1+1=3 (Intermediate)
Puppet
 
PDF
JDO 2019: Tips and Tricks from Docker Captain - Łukasz Lach
PROIDEA
 
PPT
Hadoop Installation
mrinalsingh385
 
Make container without_docker_7
Sam Kim
 
도커 없이 컨테이너 만들기 5편 마운트 네임스페이스와 오버레이 파일시스템
Sam Kim
 
Make container without_docker_6-overlay-network_1
Sam Kim
 
CoreOSによるDockerコンテナのクラスタリング
Yuji ODA
 
Configuration Surgery with Augeas
Puppet
 
Making Your Capistrano Recipe Book
Tim Riley
 
Lessons from running potentially malicious code inside Docker containers
Ben Hall
 
Continuous delivery with docker
Johan Janssen
 
도커 없이 컨테이너 만들기 4편 네트워크네임스페이스 (2)
Sam Kim
 
Docker in everyday development
Justyna Ilczuk
 
Linux Containers From Scratch: Makfile MicroVPS
joshuasoundcloud
 
Docker - container and lightweight virtualization
Sim Janghoon
 
Using cgroups in docker container
Vinay Jindal
 
Django로 만든 웹 애플리케이션 도커라이징하기 + 도커 컴포즈로 개발 환경 구축하기
raccoony
 
Docker
Brian Hogan
 
Check the version with fixes. Link in description
Przemyslaw Koltermann
 
An example Hadoop Install
Mike Frampton
 
Puppet Camp Chicago 2014: Docker and Puppet: 1+1=3 (Intermediate)
Puppet
 
JDO 2019: Tips and Tricks from Docker Captain - Łukasz Lach
PROIDEA
 
Hadoop Installation
mrinalsingh385
 

Similar to Putting some "logic" in LVM. (20)

PDF
Mirroring the root_disk under solaris SVM
Kazimal Abed Mohammed
 
PDF
Linux: LVM
Michal Sedlak
 
ODP
Logical Volume Manager. An Introduction
Juan A. Suárez Romero
 
PDF
Improve your storage with bcachefs
Marian Marinov
 
PPTX
Open Source Data Deduplication
RedWireServices
 
PPTX
lvm.pptx
Pandiya Rajan
 
PDF
Iscsi
Md Shihab
 
DOCX
Linux lv ms step by step
sudakarman
 
ODP
RAID, Replication, and You
Great Wide Open
 
DOCX
HP-UX 11iv3 How to Change Root Volume Group Name vg00 by Dusan Baljevic
Circling Cycle
 
PDF
Linux LVM Logical Volume Management
Manolis Kartsonakis
 
PDF
Lvm advanced topics
Will Sterling
 
PDF
Dev ops
Tom Hall
 
PDF
ZFS Talk Part 1
Steven Burgess
 
ODP
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Community
 
PDF
Lacie Cloud Box data recovery with Linux
Jordi Clopés Esteban
 
DOCX
How to Remove Primary Swap on HP-UX 11iv3 Online by Dusan Baljevic
Circling Cycle
 
PDF
Practical ZFS
All Things Open
 
PDF
Virtual Infrastructure
Bryan McLellan
 
PDF
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Ceph Community
 
Mirroring the root_disk under solaris SVM
Kazimal Abed Mohammed
 
Linux: LVM
Michal Sedlak
 
Logical Volume Manager. An Introduction
Juan A. Suárez Romero
 
Improve your storage with bcachefs
Marian Marinov
 
Open Source Data Deduplication
RedWireServices
 
lvm.pptx
Pandiya Rajan
 
Iscsi
Md Shihab
 
Linux lv ms step by step
sudakarman
 
RAID, Replication, and You
Great Wide Open
 
HP-UX 11iv3 How to Change Root Volume Group Name vg00 by Dusan Baljevic
Circling Cycle
 
Linux LVM Logical Volume Management
Manolis Kartsonakis
 
Lvm advanced topics
Will Sterling
 
Dev ops
Tom Hall
 
ZFS Talk Part 1
Steven Burgess
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Community
 
Lacie Cloud Box data recovery with Linux
Jordi Clopés Esteban
 
How to Remove Primary Swap on HP-UX 11iv3 Online by Dusan Baljevic
Circling Cycle
 
Practical ZFS
All Things Open
 
Virtual Infrastructure
Bryan McLellan
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Ceph Community
 
Ad

More from Workhorse Computing (20)

PDF
Object::Trampoline: Follow the bouncing object.
Workhorse Computing
 
PDF
Wheels we didn't re-invent: Perl's Utility Modules
Workhorse Computing
 
PDF
mro-every.pdf
Workhorse Computing
 
PDF
Paranormal statistics: Counting What Doesn't Add Up
Workhorse Computing
 
PDF
The $path to knowledge: What little it take to unit-test Perl.
Workhorse Computing
 
PDF
Unit Testing Lots of Perl
Workhorse Computing
 
PDF
Generating & Querying Calendar Tables in Posgresql
Workhorse Computing
 
PDF
Hypers and Gathers and Takes! Oh my!
Workhorse Computing
 
PDF
BSDM with BASH: Command Interpolation
Workhorse Computing
 
PDF
Findbin libs
Workhorse Computing
 
PDF
Memory Manglement in Raku
Workhorse Computing
 
PDF
BASH Variables Part 1: Basic Interpolation
Workhorse Computing
 
PDF
Effective Benchmarks
Workhorse Computing
 
PDF
Metadata-driven Testing
Workhorse Computing
 
PDF
The W-curve and its application.
Workhorse Computing
 
PDF
Keeping objects healthy with Object::Exercise.
Workhorse Computing
 
PDF
Perl6 Regexen: Reduce the line noise in your code.
Workhorse Computing
 
PDF
Smoking docker
Workhorse Computing
 
PDF
Getting Testy With Perl6
Workhorse Computing
 
PDF
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Workhorse Computing
 
Object::Trampoline: Follow the bouncing object.
Workhorse Computing
 
Wheels we didn't re-invent: Perl's Utility Modules
Workhorse Computing
 
mro-every.pdf
Workhorse Computing
 
Paranormal statistics: Counting What Doesn't Add Up
Workhorse Computing
 
The $path to knowledge: What little it take to unit-test Perl.
Workhorse Computing
 
Unit Testing Lots of Perl
Workhorse Computing
 
Generating & Querying Calendar Tables in Posgresql
Workhorse Computing
 
Hypers and Gathers and Takes! Oh my!
Workhorse Computing
 
BSDM with BASH: Command Interpolation
Workhorse Computing
 
Findbin libs
Workhorse Computing
 
Memory Manglement in Raku
Workhorse Computing
 
BASH Variables Part 1: Basic Interpolation
Workhorse Computing
 
Effective Benchmarks
Workhorse Computing
 
Metadata-driven Testing
Workhorse Computing
 
The W-curve and its application.
Workhorse Computing
 
Keeping objects healthy with Object::Exercise.
Workhorse Computing
 
Perl6 Regexen: Reduce the line noise in your code.
Workhorse Computing
 
Smoking docker
Workhorse Computing
 
Getting Testy With Perl6
Workhorse Computing
 
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Workhorse Computing
 
Ad

Recently uploaded (20)

PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
July Patch Tuesday
Ivanti
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 

Putting some "logic" in LVM.

  • 1. LVM It's only logical. Steven Lembark Workhorse Computing [email protected]
  • 2. What's so logical about LVM?
  • 3. What's so logical about LVM? Simple: It isn't phyiscal.
  • 4. What's so logical about LVM? Simple: It isn't phyiscal. Think of it as "virtual memory" on a disk.
  • 6. Begin at the beginning... Disk drives were 5, maybe 10Mb. Booting from tape takes too long. Can't afford a whole disk for swap. What to do? Tar overlays? RA-70 packs?
  • 7. Partitions save the day! Divide the drive for swap, boot, O/S. Allows separate mount points. New partitions == New mount points.
  • 8. What you used to do Partition the drive. Remembering to keep a separate partition for /boot. Using parted once the original layout was outgrown. Figuring out how to split space with new disks...
  • 9. Size matters. Say you have something big: 20MB of data. Tape overlays take too long. RA-70's require remounts. How can we manage it?
  • 10. Making it bigger We need an abstraction: Vnodes. Instead of "hardware".
  • 11. Veritas & HP Developed different ways to do this. Physical drives. Grouped into "blocks". Allocated into "volumes". Fortunately linux uses HP's scheme.
  • 12. First few steps pvcreate initialize physical storage. whole disk or partition. vgcreate multiple drives into pool of blocks. lvcreate re-partition blocks into mountable units.
  • 13. Example: single-disk desktop grub2 speaks lvm – goodby boot partitions! Two partitions: primary swap + everything else. Call them /dev/sda{1,2}. swap== 2 * RAM rest == lvm
  • 14. Example: single-disk desktop grub2 speaks lvm – goodby boot partitions! Two partitions: primary swap + everything else. Call them /dev/sda{1,2}. swap is for hibernate and recovery. otherwise use LVM.
  • 15. Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e
  • 16. Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e # pvcreate /dev/sda2;
  • 17. Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e # pvcreate /dev/sda2; # vgcreate vg00 /dev/sda2;
  • 18. Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e # pvcreate /dev/sda2; # vgcreate vg00 /dev/sda2; # lvcreate -L 8Gi -n root vg00;
  • 19. Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e # pvcreate /dev/sda2; # vgcreate vg00 /dev/sda2; # lvcreate -L 8Gi -n root vg00; # mkfs.xfs -blog=12 -L root /dev/vg00/root;
  • 20. Example: single-disk desktop # fdisk /dev/sda; # sda1 => 82, sda2 => 8e # pvcreate /dev/sda2; # vgcreate vg00 /dev/sda2; # lvcreate -L 8Gi -n root vg00; # mkfs.xfs -blog=12 -L root /dev/vg00/root; # mount /dev/vg00/root /mnt/gentoo;
  • 21. Finding yourself Ever get sick of UUID? Labels? Device paths?
  • 22. Finding yourself Ever get sick of UUID? Labels? Device paths? LVM assigns UUIDs to PV, VG, LV.
  • 23. Finding yourself Ever get sick of UUID? Labels? Device paths? Let LVM do the walking: vgscan -v
  • 24. Give linux the boot mount -t proc none /proc; mount -t sysfs none /sys; /sbin/mdadm --verbose --assemble --scan; /sbin/vgscan –verbose; /sbin/vgchange -a y; /sbin/mount /dev/vg00/root /mnt/root; exec /sbin/switch_root /mnt/root /sbin/init;
  • 25. Then root fills up... Say goodby to parted. lvextend -L12Gi /dev/vg00/root; xfs_growfs /dev/vg00/root; Notice the lack of any umount.
  • 26. Add a new disk /sbin/fdisk /dev/sdb; # sdb1 => 8e pvcreate /dev/sdb1; vgextend vg00 /dev/sdb1; lvextend -L24Gi /dev/vg00/root; xfs_growfs /dev/vg00/root;
  • 27. And another disk, and another... Let's say you've scrounged ten disks. One large VG.
  • 28. And another disk, and another... Let's say you've scrounged ten disks. One large VG. Then one disk fails.
  • 29. And another disk, and another... Let's say you've scrounged ten disks. One large VG. Then one disk fails. And the entire VG with it.
  • 30. Adding volume groups Lesson: Over-large VG's become fragile. Fix: Multiple VG's partition the vulnerability. One disk won't bring down everyhing.
  • 31. Growing a new VG Plug in and fdisk your new device. # pvcreate /dev/sdc1; # vgcreate vg01 /dev/sdc1; # lvcreate -n home vg01; # mkfs.xfs -blog=12 -L home /dev/vg01/home; Copy files, add /dev/vg01/home to /etc/fstab.
  • 32. Your backups just got easier Separate mount points for "scratch". "find . -xdev" Mount /var/spool, /var/tmp. Back up persistent portion of /var with rest of root.
  • 33. More smaller volumes More /etc/fstab entries. Isolate disk failures to non-essential data. Back up by mount point. Use different filesystems (xfs vs. ext4).
  • 34. RAID + LVM - LV with copies using LVM. - Or make PV's out of mdadm volumes. LV's simplify handling huge RAID volumes.
  • 35. LVM RAID # lvcreate -m <#copies> … Automatically duplicates LV data. "-m 2" == three-volume RAID (1 data + 2 copy). Blocks don't need to be contiguous. Can lvextend later on.
  • 36. LVM on RAID Division of labor: - mdadm for RAID. - LVM for mount points. Use mdadm to create space. Use LVM to manage it.
  • 37. "Stride" for RAID > 1 LV blocks == RAID page size. Good for RAID 5, 6, 10. Meaningless for mirror (LVM or hardware).
  • 38. Monitored LVM # lvmcreate –monitor y ... Use dmeventd for monitoring. Know about I/O errors. What else would you want to do at 3am?
  • 39. Real SysAdmin's don't need sleep! Archiving many GB takes time. Need stable filesystems. Easy: Kick everyone off, run the backups at 0300.
  • 40. Real SysAdmin's don't need sleep! Archiving many GB takes time. Need stable filesystems. Easy: Kick everyone off, run the backups at 0300. If you enjoy that sort of thing.
  • 41. Snapshots: mounted backups Not a hot copy. Snapshot pool == stable version of COW blocks. Stable version of LV being backed up. Size == max pages that might change during lifetime.
  • 42. Lifecycle of a snapshot Find mountpoint. Snapshot mount point. Work with static contents. Drop snapshot.
  • 43. Most common: backup Live database Spool directory Disk cache Home dirs
  • 44. Backup a database Data and config under /var/postgres. Data on single LV /dev/vg01/postgres. At 0300 up to 1GB per hour. Daily backup takes about 30 minutes. VG keeps 8GB free for snapshots.
  • 45. Backup a database # lvcreate -L1G -s -n pg-tmp /dev/vg01/postgres; 1G == twice the usual amount. Updates to /dev/vg01/postgres continue. Original pages stored in /dev/vg01/pg-backup. I/O error in pg-backup if > 1GB written.
  • 46. Backup a database # lvcreate -L1G -s -n pg-tmp /dev/vg01/postgres; # mount --type xfs -o'ro,norecovery,nouuid' /dev/vg01/pg-tmp /mnt/backup; # find /mnt/backup -xdev … /mnt/backup is stable for duration of backup. /var/postgres keeps writing.
  • 47. Backup a database One downside: Duplicate running database. Takes extra steps to restart. Not difficult. Be Prepared.
  • 48. Giving away what you ain't got "Thin volumes". Like sparse files. Pre-allocate pool of blocks. LV's grow as needed. Allows overcommit.
  • 49. "Thin"? "Thick" == allocate LV blocks at creation time. "—thin" assigns virtual size. Physical size varies with use.
  • 50. Why bother? Filesystems that change over time. No need to pre-allocate all of the space. Add physical storage as needed. ETL intake. Archival. User scratch.
  • 51. Example: Scratch space for users. Say you have ~30GB of free space. And three users. Each "needs" 40GB of space. No problem.
  • 52. The pool is an LV. "--thinpool" labels the LV as a pool. Allocate 30GB into /dev/vg00/scatch. # lvcreate -L 30Gi --thinpool scratch vg00;
  • 53. Virtually allocate LV "lvcreate -V" allocates space out of the pool. "-V" == "virtual" # lvcreate -V 40Gi --thin -n thin_one /dev/vg00/scratch; # lvcreate -V 40Gi ...
  • 54. Virtually yours Allocated 120Gi using 30GB of disk. lvdisplay shows 0 used for each volume??
  • 55. Virtually yours Allocated 60GiB of 50GB. lvdisplay shows 0 used for each volume?? Right: None used. Yet. 40GiB is a limit.
  • 56. Pure magic! Make a filesytem. Mount the lvols. df shows them as 40GiB. Everyone is happy!
  • 57. Pure magic! Make a filesytem. Mount the lvols. df shows them as 20GiB. Everyone is happy... Until 30GB is used up.
  • 58. Size does matter. No blocks left to allocate. Now what? Writing procs are "killable blocked". Hold queue until "kill -KILL" or space available.
  • 59. One fix: scrounge a disk vgextend vg00; lvextend -L<whatever> /dev/vg00/scratch; Bingo: free blocks.
  • 60. Reduce, reuse, recycle fstrim(8) removed unused blocks from a filesystem. Reduces virtual allocations. Allows virtual volumes to re-grow: ftrim --all -verbose; cron is your friend.
  • 61. Highly dynamic environment Weekly doesn't cut it: download directory. uncompress space compile large projects. Or a notebook with small SSD.
  • 62. Automatic real-time trimming 3.0+ kernel w/ "TRIM". Device needs "FITRIM". http://xfs.org/index.php/FITRIM/discard
  • 63. Sanity check: discard avaiable? $ cat /sys/block/sda/queue/discard_max_bytes; 2147450880 $ cat /sys/block/dm-8/queue/discard_max_bytes; 2147450880 So far so good...
  • 64. Do the deed /dev/vg00/scratch_one /scratch/jowbloe xfs defaults,discard 0 1 or mount -o defaults,discard /foo /mnt/foo;
  • 65. What you see is all you get Dynamic discard == overhead. Often not worse than network mounts. YMMV... Avoids "just in case" provisioning. Good with SSD, battery-backed RAID.
  • 66. LVM benefits - Saner mount points. - Hot backups. - Dynamic space manglement.
  • 67. LVM benefits - Saner mount points. - Hot backups. - Dynamic space manglement. Seems logical?