SlideShare a Scribd company logo
August 2018
Refreshing our knowledge
HugePages:
Why, what and how
2© The Pythian Group Inc., 2018
What's up with
HugePages?
© The Pythian Group Inc., 2018 3
Jose Rodriguez
Project engineer at Pythian
● +10 years of experience, mainly Oracle but also SQL Server and
others like DB2 LUW or PostgreSQL
● Solaris, Linux and Windows RAC and HA with DG and GG
Other areas of expertise, i.e., things I like doing
● Scripting and automation (lazy DBA)
● Machine Learning
● Golden Gate replication
● Cloud related stuff (who doesn't nowadays, eh? )
About Pythian
Pythian’s 400+ IT professionals
help companies adopt
and manage disruptive data
technologies to better compete
© 2018 Pythian. Confidential 4
© 2018 Pythian. Confidential 5
Systems currently
managed by Pythian
EXPERIENCED
Pythian experts
in 35 countries
GLOBAL
Millennia of experience
gathered and shared
over 19 years
EXPERTS
11,800 2400+
What are you taking away
So you can leave now if you already have it
Agenda
7© The Pythian Group Inc., 2018
Why do we care?
What are HugePages?
How to implement?
What can happen - Case Studies
© The Pythian Group Inc., 2018 8
● It is 2019, but HugePages seem yet to be
understood and broadly implemented
● More memory -> more problems
● systems with >= 1TB RAM are common
nowadays
● Problems caused by lack of HugePages are
not always easy to spot
Why do we care?
9© The Pythian Group Inc., 2018
What Are HugePages
Behind the scenes
© The Pythian Group Inc., 2018 10
Virtual to physical memory mapping
0x00
...
0x01
...
0x23
0x24
42
157
245
userprocess
mainmemory
11© The Pythian Group Inc., 2018
Memory allocations are tracked in PageTables
virtual real
0x00 42
0x01 175
0x02 176
0x03 177
0x04 178
... ...
4kB
© The Pythian Group Inc., 2018 12
Virtual to physical memory mapping
...
virtual real
0x00 42
0x01 175
0x02 176
0x03 177
0x04 178
... ...
userprocess
physicalmemory
PageTable
© The Pythian Group Inc., 2018 13
● To allocate 100 GiB there will be 26,214,400 memory pages of 4KB
each
● An OS would typically group and map them hierarchically in frames
■ i.e. continuous space can be mapped more efficiently
● Each PageTable Entry (PTE) is around 8 bytes for 64 bits systems
● Vmem offset + Physical address + Flags
● PageTables are also stored in memory. Size would be 200 MiB in our
example
● For shared memory segments (e.g. SGA) each process has a copy of
the PageTable
● A regular single instance may have 1000 sessions * 200 MiB each leads
to 200GiB of RAM to track RAM.
What's up with PageTables?
14© The Pythian Group Inc., 2018
HugePages to the Rescue!
virtual real
0x00 1
0x02 ...
0x02 ...
0x03 ...
0x04 ...
... ...
2048KB2048KiB
4KiB
PageTable reduced 512 time to only ~400KiB
© The Pythian Group Inc., 2018 15
● Allocate only enough HugePages
● HugePages cannot be swapped out
● Oracle Automatic Memory Management (AMM) is incompatible
with HugePages
● Transparent HugePages (THP) do not go along well with
Oracle, disabled by default in UEK2+
● Platforms other than Linux x64 have even bigger choices of
large page sizes up to 1GiB
● In extreme cases, SGA of TiBs in size, may lead to slow
instance startup. PRE_PAGE_SGA may help here
● AMM is forbidden in 12.2 if RAM>4GiBs, so HP should be used
here.
HugePages additional facts
© The Pythian Group Inc., 2018 16
● Do we really need/want HugePages for ASM?
● ASM uses AMM by default so initially not HP compatible.
/dev/shm is important here.
● We don't for "regular" ASM instances. Documentation and best
practices say this clearly, although this may change in future
releases.
● Highly recommended for Exadata. MOS notes 2062068.1 and
2111010.1 clearly indicate that ASMM should be enabled and
HugePages available for ASM.
HugePages and ASM
© The Pythian Group Inc., 2018 17
● /dev/shm is automatically set to 50% of total RAM
● Oracle AMM uses /dev/shm to "store" shared memory pages
● We may be tempted to reduce the size of /dev/shm to allow
more room to HugePages. No need
● HugePages and AMM are incompatible
HugePages and /dev/shm
© The Pythian Group Inc., 2018 18
● Does Oracle use HugePages for PGA?
● No, it doesn't (currently)
● No hard evidence against it in docs or MOS
● Tests show that Oracle is not allocating HugePages for it
● Counterintuitive for small memory allocations
● May change in the future (DWH or DSS sort area)
HugesPages and PGA
19© The Pythian Group Inc., 2018
HugePages on the Cloud
● Supported on AWS RDS since July, 2017
but not enabled by default. There are
limitations to the type of instance you can
enable HP on.
● No official documentation on Azure, but a
recent test showed that we can set up HP
in a Linux VM running on Azure.
● Google Cloud platform supports
HugePages.
● Oracle Cloud Service – Officially
supported for Exadata Cloud Service.
OCI allows it but not by default.
Classic platform has it enabled by
default.
20© The Pythian Group Inc., 2018
Let's do it!
© The Pythian Group Inc., 2018 21
● Script provided in MOS: "Oracle Linux: Shell Script to Calculate
Values Recommended Linux HugePages / HugeTLB Configuration
(Doc ID 401749.1)"
● Or use the following formula:
SGA size (MiB) / 2 (MiB) + 42
How many HugePages do I need?
22© The Pythian Group Inc., 2018
● May need extra work on VMs
● Disable AMM
● Set use_large_pages=only
● Disable THP
● Set memlock user limit
● Set vm.nr_hugepages
● Set vm.hugetlb_shm_group as required
(SUSE)
● Reboot OS (not always required)
● Restart Oracle instance
● Use TuneD profiles on RHEL 7 and
above
Implementation steps
23© The Pythian Group Inc., 2018
Success!
2018-08-20T12:43:18.163509+00:00
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)
2018-08-20T12:43:18.163653+00:00
Per process system memlock (soft) limit = 2048M
2018-08-20T12:43:18.163821+00:00
Expected per process system memlock (soft) limit to lock
SHARED GLOBAL AREA (SGA) into memory: 1540M
2018-08-20T12:43:18.163952+00:00
Available system pagesizes:
4K, 2048K
2018-08-20T12:43:18.164143+00:00
Supported system pagesize(s):
2018-08-20T12:43:18.164220+00:00
PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s)
2018-08-20T12:43:18.164382+00:00
2048K 1056 770 770 NONE
[oracle@HPtesting ~]$ grep
^HugePages /proc/meminfo
HugePages_Total: 1056
HugePages_Free: 287
HugePages_Rsvd: 1
HugePages_Surp: 0
© The Pythian Group Inc., 2018 24
LargePages (A.K.A. HugePages in Windows)
● Available since Oracle 10.1
● Enabled by adding an entry into the registry, ideally for each SID
instead of general
● Again only used for SGA
● Not considered in the "Working Set" so memory usage metrics are
now somehow flawed
● Startup times may be slow and with high impact on the server
performance for older versions
● Oriented to DWH type databases
25© The Pythian Group Inc., 2018
Case Studies
Lack of HugePages causing trouble
26© The Pythian Group Inc., 2018
RAC node eviction
● 1 node of 2-node cluster evicted
● Logs show a timeout responding to
something prior to eviction
● We found no other errors or
evidence
● sar to the rescue!
27© The Pythian Group Inc., 2018
RAC node eviction – “sar -r”
05:20:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
05:30:01 AM 361136 65389512 99.45 932 24477352 41041528 29.48 29509228 1880236 564
05:40:01 AM 354896 65395752 99.46 95164 24434320 41039432 29.48 29504552 1902320 552
05:50:01 AM 382940 65367708 99.42 87912 24420284 41021636 29.47 29474908 1902904 496
06:00:01 AM 385016 65365632 99.41 52432 24414712 41053708 29.49 29477412 1878860 488
06:10:01 AM 386796 65363852 99.41 596 24416944 41046880 29.48 29412032 1909420 628
06:20:02 AM 376484 65374164 99.43 596 24546212 41069336 29.50 29603108 2107020 460
06:30:01 AM 335176 65415472 99.49 596 24893684 41094396 29.52 29676840 2078424 648
06:40:05 AM 334152 65416496 99.49 596 24554064 41222332 29.61 29453168 2061660 0
06:50:03 AM 349392 65401256 99.47 596 22963852 41360864 29.71 28031816 1851900 72
07:00:10 AM 342752 65407896 99.48 596 21190320 41768848 30.00 26854676 1723480 0
07:10:04 AM 341756 65408892 99.48 596 20787592 41769828 30.00 26706944 1765980 12
Average: 414530 65336118 99.37 19907 24589646 41094908 29.52 29439910 1903200 2315
07:16:28 AM LINUX RESTART
28© The Pythian Group Inc., 2018
RAC node eviction – “sar -B”
05:20:01 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
05:30:01 AM 7257.49 91.49 8704.61 1.28 6571.59 44.62 0.00 30.63 68.65
05:40:01 AM 2021.61 2486.25 141607.78 5.77 60734.72 451.59 38.26 386.08 78.82
05:50:01 AM 6980.26 71.57 7241.62 0.56 6380.14 35.86 7.72 38.22 87.71
06:00:01 AM 7262.73 67.56 8717.63 1.10 6549.42 47.03 1.18 40.58 84.18
06:10:01 AM 1759.35 379.66 15556.00 4.59 7320.75 185.60 2.75 143.16 76.01
06:20:02 AM 63309.67 3624.60 34222.39 267.30 50019.66 42307.14 982.44 13754.06 31.77
06:30:01 AM 115962.81 2730.86 30665.11 373.74 86055.51 843180.51 16021.66 26924.73 3.13
06:40:05 AM 83609.10 1331.45 20393.23 235.71 62484.15 1104330.76 25458.10 20843.61 1.84
06:50:03 AM 158193.69 4252.68 27395.53 375.73 111261.98 1848753.28 61689.42 38619.69 2.02
07:00:10 AM 98699.51 4257.23 23771.15 292.99 70708.84 590100.80 12354.06 23573.29 3.91
07:10:04 AM 125777.83 2409.66 23413.33 415.65 91748.06 952301.48 24672.30 31401.32 3.21
Average: 20671.81 1108.76 15287.47 51.49 18890.14 126225.66 3483.02 4065.45 3.13
07:16:28 AM LINUX RESTART
29© The Pythian Group Inc., 2018
RAC node eviction - “cat /proc/meminfo” (after incident)
$ cat /proc/meminfo
MemTotal: 65918584 kB
MemFree: 1583912 kB
MemAvailable: 20034320 kB
Buffers: 416208 kB
Cached: 41349928 kB
SwapTotal: 73469948 kB
SwapFree: 73334068 kB
KernelStack: 23392 kB
PageTables: 12495120 kB
AnonHugePages: 1478656 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
30© The Pythian Group Inc., 2018
Unexpected Swapping
● Lots of notes about swapping in
alert log
● Small (2GB SGA)
● Rarely used database
● vm.swappiness was not reviewed,
probably at 60
WARNING: Heavy swapping observed on system in
last 5 mins.
pct of memory swapped in [0.27%] pct of memory
swapped out [2.22%].
Please make sure there is no memory pressure
and the SGA and PGA are configured correctly.
Look at DBRM trace file for more details.
© The Pythian Group Inc., 2018 31
● Once a month load > 400
● System unusable but no crash
CPU stealing
32© The Pythian Group Inc., 2018
Yet again - sar is the star
[oracle@oramstr01 oracle]$ sar -f sa07 -s 14:30:00 -e 18:30:00 -u
Linux 2.6.32-573.8.1.el6.x86_64 (oramstr01.testing.com) 12/07/2016 _x86_64_ (80 CPU)
02:30:01 PM CPU %user %nice %system %iowait %steal %idle
02:40:01 PM all 38.31 0.00 3.35 3.02 0.00 55.32
02:50:01 PM all 34.02 0.00 3.15 2.63 0.00 60.19
03:00:01 PM all 34.20 0.00 3.20 1.68 0.00 60.92
03:10:01 PM all 40.79 0.00 3.81 2.96 0.00 52.44
03:20:01 PM all 37.33 0.00 3.43 2.40 0.00 56.83
03:30:04 PM all 40.72 0.00 6.12 2.62 0.00 50.54
03:40:01 PM all 10.08 0.00 88.36 0.30 0.00 1.26
03:50:02 PM all 8.66 0.00 90.82 0.05 0.00 0.47
04:00:03 PM all 31.66 0.00 68.27 0.02 0.00 0.04
04:10:03 PM all 45.84 0.00 49.19 0.90 0.00 4.07
04:20:01 PM all 40.68 0.00 54.30 0.97 0.00 4.05
04:30:01 PM all 37.81 0.00 43.22 1.13 0.00 17.84
04:40:02 PM all 15.68 0.00 84.18 0.05 0.00 0.09
04:50:02 PM all 12.76 0.00 87.23 0.00 0.00 0.01
05:00:03 PM all 11.84 0.00 88.14 0.00 0.00 0.01
05:10:01 PM all 18.56 0.00 62.74 0.71 0.00 17.99
05:20:01 PM all 15.84 0.00 1.73 1.17 0.00 81.27
05:30:01 PM all 19.22 0.00 1.75 0.71 0.00 78.33
05:40:01 PM all 25.51 0.00 2.02 1.15 0.00 71.32
05:50:01 PM all 23.78 0.00 1.85 1.05 0.00 73.32
06:00:01 PM all 20.15 0.00 1.66 0.88 0.00 77.30
06:10:01 PM all 21.14 0.00 2.68 2.93 0.00 73.25
06:20:01 PM all 18.94 0.00 2.33 2.26 0.00 76.46
Average: all 26.23 0.00 32.81 1.29 0.00 39.67
© The Pythian Group Inc., 2018 33
[oracle@oramstr01 ~]$ date
Wed Dec 14 10:23:22 EST 2016
[oracle@oramstr01 ~]$ ps -ef | grep -c
oracleccxp
2717
[oracle@oramstr01 ~]$ grep PageTable
/proc/meminfo
PageTables: 461002976 kB
Yes, that is 440 GiBs of PageTables!
Sessions and pagetable memory
34© The Pythian Group Inc., 2018
Yet again - sar is the star
[oracle@oramstr01 oracle]$ sar -r -f sa07 -s 14:30:00 -e 17:30:00
Linux 2.6.32-573.8.1.el6.x86_64 (oramstr01.testing.com) 12/07/2016 _x86_64_ (80 CPU)
02:30:01 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
02:40:01 PM 57063440 1001654588 94.61 1610536 412910108 311506624 26.72
02:50:01 PM 57747440 1000970588 94.55 1610564 412911112 306685964 26.31
03:00:01 PM 46592180 1012125848 95.60 1610596 412914608 308191992 26.44
03:10:01 PM 31680840 1027037188 97.01 1610608 412917688 310993660 26.68
03:20:01 PM 16457972 1042260056 98.45 1610628 412920472 309580976 26.56
03:30:04 PM 1739692 1056978336 99.84 1610628 411393436 317613764 27.25
03:40:01 PM 5066928 1053651100 99.52 1538352 395198928 324298580 27.82
03:50:02 PM 28196104 1030521924 97.34 1342292 381568100 324394208 27.83
04:00:03 PM 11313156 1047404872 98.93 1359396 378901468 326693864 28.03
04:10:03 PM 80061500 978656528 92.44 1359488 375162128 321167148 27.55
04:20:01 PM 64494004 994224024 93.91 1359508 375163768 322061964 27.63
04:30:01 PM 108230896 950487132 89.78 1359532 375166776 313685004 26.91
04:40:02 PM 135833716 922884312 87.17 1359548 375168248 318691876 27.34
04:50:02 PM 192323488 866394540 81.83 1359556 375169736 315572568 27.07
05:00:03 PM 235108136 823609892 77.79 1359648 375172216 312304460 26.79
05:10:01 PM 360281464 698436564 65.97 1359724 375173424 295083536 25.32
05:20:01 PM 357150032 701567996 66.27 1359748 375175952 296449248 25.43
35© The Pythian Group Inc., 2018
Summary
● HugePages are usually good to
have
● How to implement
● Know where to look
● /proc/meminfo
■ HugePages
■ Pagetables
● Remember the power of sar/OSwBB
● Following best practices prevents
issues
© The Pythian Group Inc., 2018 36
References
● Oracle 11g internals part 1: Automatic Memory Management by Tanel Poder
● Oracle SGA memory allocation on startup by Fritz Hoogland
● Oracle Linux: Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB
Configuration (Doc ID 401749.1)
● Oracle Exadata Initialization Parameters and Diskgroup Attributes Best Practices (
Doc ID 2062068.1)
● 12.2 Grid Infrastructure and Database Upgrade steps for Exadata Database Machine running
11.2.0.3 and later on Oracle Linux (Doc ID 2111010.1)
● ASM & Shared Pool (ORA-4031) (Doc ID 437924.1)
Q&A
Ask now or reach out later, but don't keep the question for yourself
38© The Pythian Group Inc., 2018
THANK YOU
Hope you enjoyed it

More Related Content

PDF
Whitepaper: Where did my CPU go?
Kristofferson A
 
PDF
Linux Systems Performance 2016
Brendan Gregg
 
PPTX
LINEのMySQL運用について 修正版
LINE Corporation
 
PDF
Kuberneteの運用を支えるGitOps
shunki fujiwara
 
PPTX
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
Miki Shimogai
 
PPTX
[Oracle DBA & Developer Day 2014] しばちょう先生による特別講義! RMANの運用と高速化チューニング
オラクルエンジニア通信
 
PDF
Clone Oracle Databases In Minutes Without Risk Using Enterprise Manager 13c
Alfredo Krieg
 
PDF
Zero Data Loss Recovery Applianceのご紹介
オラクルエンジニア通信
 
Whitepaper: Where did my CPU go?
Kristofferson A
 
Linux Systems Performance 2016
Brendan Gregg
 
LINEのMySQL運用について 修正版
LINE Corporation
 
Kuberneteの運用を支えるGitOps
shunki fujiwara
 
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
Miki Shimogai
 
[Oracle DBA & Developer Day 2014] しばちょう先生による特別講義! RMANの運用と高速化チューニング
オラクルエンジニア通信
 
Clone Oracle Databases In Minutes Without Risk Using Enterprise Manager 13c
Alfredo Krieg
 
Zero Data Loss Recovery Applianceのご紹介
オラクルエンジニア通信
 

What's hot (20)

PPTX
Sql server のバックアップとリストアの基礎
Masayuki Ozawa
 
PDF
Apache Kafka 0.11 の Exactly Once Semantics
Yoshiyasu SAEKI
 
PDF
PostgreSQL High Availability in a Containerized World
Jignesh Shah
 
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
PDF
PostgreSQLレプリケーション10周年!徹底紹介!(PostgreSQL Conference Japan 2019講演資料)
NTT DATA Technology & Innovation
 
PDF
PostgreSQLの関数属性を知ろう
kasaharatt
 
PDF
MySQLとPostgreSQLの基本的なバックアップ比較
Shinya Sugiyama
 
PDF
PySpark in practice slides
Dat Tran
 
PDF
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
NTT DATA OSS Professional Services
 
PDF
PostgreSQL replication
NTT DATA OSS Professional Services
 
PDF
My First 100 days with an Exadata (PPT)
Gustavo Rene Antunez
 
PDF
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
Kenny Gryp
 
DOCX
【VM保護備份專題】Dell Power Protect Data Manager (PPDM) 詳解TSDM機制
裝機安 Angelo
 
PDF
Airflow introduction
Chandler Huang
 
PDF
GCを発生させないJVMとコーディングスタイル
Kenji Kazumura
 
PDF
ゲーム開発者のための C++11/C++14
Ryo Suzuki
 
PDF
PostgreSQL Replication High Availability Methods
Mydbops
 
PPTX
Zero Data Loss Recovery Appliance 設定手順例
オラクルエンジニア通信
 
PPTX
Apache Airflow in Production
Robert Sanders
 
PDF
Always on in sql server 2017
Gianluca Hotz
 
Sql server のバックアップとリストアの基礎
Masayuki Ozawa
 
Apache Kafka 0.11 の Exactly Once Semantics
Yoshiyasu SAEKI
 
PostgreSQL High Availability in a Containerized World
Jignesh Shah
 
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
PostgreSQLレプリケーション10周年!徹底紹介!(PostgreSQL Conference Japan 2019講演資料)
NTT DATA Technology & Innovation
 
PostgreSQLの関数属性を知ろう
kasaharatt
 
MySQLとPostgreSQLの基本的なバックアップ比較
Shinya Sugiyama
 
PySpark in practice slides
Dat Tran
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
NTT DATA OSS Professional Services
 
PostgreSQL replication
NTT DATA OSS Professional Services
 
My First 100 days with an Exadata (PPT)
Gustavo Rene Antunez
 
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
Kenny Gryp
 
【VM保護備份專題】Dell Power Protect Data Manager (PPDM) 詳解TSDM機制
裝機安 Angelo
 
Airflow introduction
Chandler Huang
 
GCを発生させないJVMとコーディングスタイル
Kenji Kazumura
 
ゲーム開発者のための C++11/C++14
Ryo Suzuki
 
PostgreSQL Replication High Availability Methods
Mydbops
 
Zero Data Loss Recovery Appliance 設定手順例
オラクルエンジニア通信
 
Apache Airflow in Production
Robert Sanders
 
Always on in sql server 2017
Gianluca Hotz
 
Ad

Similar to Huge pages why-what-how (20)

PDF
Deploying MariaDB for HA on Google Cloud Platform
MariaDB plc
 
PDF
Monitoring with Clickhouse
unicast
 
PDF
Implementing MySQL Database-as-a-Service using open source tools
All Things Open
 
PDF
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
Gleb Otochkin
 
PDF
Yarn optimization (Real life use case)
Jean-Louis Quéguiner
 
PPTX
Full scan frenzy at amadeus
MongoDB
 
PPTX
[RakutenTechConf2013] [C-1] Rakuten new infrastructure
Rakuten Group, Inc.
 
PPTX
Make your data fly - Building data platform in AWS
Kimmo Kantojärvi
 
PDF
Key considerations in productionizing streaming applications
KafkaZone
 
PDF
“Building consistent and highly available distributed systems with Apache Ign...
Tom Diederich
 
PDF
ch9_virMem.pdf
HoNguyn746501
 
PPTX
Cloud Native with Kyma
Piotr Kopczynski
 
PDF
MySQL Scalability and Reliability for Replicated Environment
Jean-François Gagné
 
PDF
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
Equnix Business Solutions
 
PDF
20190909_PGconf.ASIA_KaiGai
Kohei KaiGai
 
PDF
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Severalnines
 
PDF
From the trenches: scaling a large log management deployment
FaithWestdorp
 
PDF
Container Attached Storage (CAS) with OpenEBS - SDC 2018
OpenEBS
 
PPTX
IBM Power Systems Update 1Q17
David Spurway
 
PDF
MySQL Scalability and Reliability for Replicated Environment
Jean-François Gagné
 
Deploying MariaDB for HA on Google Cloud Platform
MariaDB plc
 
Monitoring with Clickhouse
unicast
 
Implementing MySQL Database-as-a-Service using open source tools
All Things Open
 
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
Gleb Otochkin
 
Yarn optimization (Real life use case)
Jean-Louis Quéguiner
 
Full scan frenzy at amadeus
MongoDB
 
[RakutenTechConf2013] [C-1] Rakuten new infrastructure
Rakuten Group, Inc.
 
Make your data fly - Building data platform in AWS
Kimmo Kantojärvi
 
Key considerations in productionizing streaming applications
KafkaZone
 
“Building consistent and highly available distributed systems with Apache Ign...
Tom Diederich
 
ch9_virMem.pdf
HoNguyn746501
 
Cloud Native with Kyma
Piotr Kopczynski
 
MySQL Scalability and Reliability for Replicated Environment
Jean-François Gagné
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
Equnix Business Solutions
 
20190909_PGconf.ASIA_KaiGai
Kohei KaiGai
 
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Severalnines
 
From the trenches: scaling a large log management deployment
FaithWestdorp
 
Container Attached Storage (CAS) with OpenEBS - SDC 2018
OpenEBS
 
IBM Power Systems Update 1Q17
David Spurway
 
MySQL Scalability and Reliability for Replicated Environment
Jean-François Gagné
 
Ad

Recently uploaded (20)

PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
Doc9.....................................
SofiaCollazos
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Doc9.....................................
SofiaCollazos
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 

Huge pages why-what-how

  • 1. August 2018 Refreshing our knowledge HugePages: Why, what and how
  • 2. 2© The Pythian Group Inc., 2018 What's up with HugePages?
  • 3. © The Pythian Group Inc., 2018 3 Jose Rodriguez Project engineer at Pythian ● +10 years of experience, mainly Oracle but also SQL Server and others like DB2 LUW or PostgreSQL ● Solaris, Linux and Windows RAC and HA with DG and GG Other areas of expertise, i.e., things I like doing ● Scripting and automation (lazy DBA) ● Machine Learning ● Golden Gate replication ● Cloud related stuff (who doesn't nowadays, eh? )
  • 4. About Pythian Pythian’s 400+ IT professionals help companies adopt and manage disruptive data technologies to better compete © 2018 Pythian. Confidential 4
  • 5. © 2018 Pythian. Confidential 5 Systems currently managed by Pythian EXPERIENCED Pythian experts in 35 countries GLOBAL Millennia of experience gathered and shared over 19 years EXPERTS 11,800 2400+
  • 6. What are you taking away So you can leave now if you already have it
  • 7. Agenda 7© The Pythian Group Inc., 2018 Why do we care? What are HugePages? How to implement? What can happen - Case Studies
  • 8. © The Pythian Group Inc., 2018 8 ● It is 2019, but HugePages seem yet to be understood and broadly implemented ● More memory -> more problems ● systems with >= 1TB RAM are common nowadays ● Problems caused by lack of HugePages are not always easy to spot Why do we care?
  • 9. 9© The Pythian Group Inc., 2018 What Are HugePages Behind the scenes
  • 10. © The Pythian Group Inc., 2018 10 Virtual to physical memory mapping 0x00 ... 0x01 ... 0x23 0x24 42 157 245 userprocess mainmemory
  • 11. 11© The Pythian Group Inc., 2018 Memory allocations are tracked in PageTables virtual real 0x00 42 0x01 175 0x02 176 0x03 177 0x04 178 ... ... 4kB
  • 12. © The Pythian Group Inc., 2018 12 Virtual to physical memory mapping ... virtual real 0x00 42 0x01 175 0x02 176 0x03 177 0x04 178 ... ... userprocess physicalmemory PageTable
  • 13. © The Pythian Group Inc., 2018 13 ● To allocate 100 GiB there will be 26,214,400 memory pages of 4KB each ● An OS would typically group and map them hierarchically in frames ■ i.e. continuous space can be mapped more efficiently ● Each PageTable Entry (PTE) is around 8 bytes for 64 bits systems ● Vmem offset + Physical address + Flags ● PageTables are also stored in memory. Size would be 200 MiB in our example ● For shared memory segments (e.g. SGA) each process has a copy of the PageTable ● A regular single instance may have 1000 sessions * 200 MiB each leads to 200GiB of RAM to track RAM. What's up with PageTables?
  • 14. 14© The Pythian Group Inc., 2018 HugePages to the Rescue! virtual real 0x00 1 0x02 ... 0x02 ... 0x03 ... 0x04 ... ... ... 2048KB2048KiB 4KiB PageTable reduced 512 time to only ~400KiB
  • 15. © The Pythian Group Inc., 2018 15 ● Allocate only enough HugePages ● HugePages cannot be swapped out ● Oracle Automatic Memory Management (AMM) is incompatible with HugePages ● Transparent HugePages (THP) do not go along well with Oracle, disabled by default in UEK2+ ● Platforms other than Linux x64 have even bigger choices of large page sizes up to 1GiB ● In extreme cases, SGA of TiBs in size, may lead to slow instance startup. PRE_PAGE_SGA may help here ● AMM is forbidden in 12.2 if RAM>4GiBs, so HP should be used here. HugePages additional facts
  • 16. © The Pythian Group Inc., 2018 16 ● Do we really need/want HugePages for ASM? ● ASM uses AMM by default so initially not HP compatible. /dev/shm is important here. ● We don't for "regular" ASM instances. Documentation and best practices say this clearly, although this may change in future releases. ● Highly recommended for Exadata. MOS notes 2062068.1 and 2111010.1 clearly indicate that ASMM should be enabled and HugePages available for ASM. HugePages and ASM
  • 17. © The Pythian Group Inc., 2018 17 ● /dev/shm is automatically set to 50% of total RAM ● Oracle AMM uses /dev/shm to "store" shared memory pages ● We may be tempted to reduce the size of /dev/shm to allow more room to HugePages. No need ● HugePages and AMM are incompatible HugePages and /dev/shm
  • 18. © The Pythian Group Inc., 2018 18 ● Does Oracle use HugePages for PGA? ● No, it doesn't (currently) ● No hard evidence against it in docs or MOS ● Tests show that Oracle is not allocating HugePages for it ● Counterintuitive for small memory allocations ● May change in the future (DWH or DSS sort area) HugesPages and PGA
  • 19. 19© The Pythian Group Inc., 2018 HugePages on the Cloud ● Supported on AWS RDS since July, 2017 but not enabled by default. There are limitations to the type of instance you can enable HP on. ● No official documentation on Azure, but a recent test showed that we can set up HP in a Linux VM running on Azure. ● Google Cloud platform supports HugePages. ● Oracle Cloud Service – Officially supported for Exadata Cloud Service. OCI allows it but not by default. Classic platform has it enabled by default.
  • 20. 20© The Pythian Group Inc., 2018 Let's do it!
  • 21. © The Pythian Group Inc., 2018 21 ● Script provided in MOS: "Oracle Linux: Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration (Doc ID 401749.1)" ● Or use the following formula: SGA size (MiB) / 2 (MiB) + 42 How many HugePages do I need?
  • 22. 22© The Pythian Group Inc., 2018 ● May need extra work on VMs ● Disable AMM ● Set use_large_pages=only ● Disable THP ● Set memlock user limit ● Set vm.nr_hugepages ● Set vm.hugetlb_shm_group as required (SUSE) ● Reboot OS (not always required) ● Restart Oracle instance ● Use TuneD profiles on RHEL 7 and above Implementation steps
  • 23. 23© The Pythian Group Inc., 2018 Success! 2018-08-20T12:43:18.163509+00:00 Dump of system resources acquired for SHARED GLOBAL AREA (SGA) 2018-08-20T12:43:18.163653+00:00 Per process system memlock (soft) limit = 2048M 2018-08-20T12:43:18.163821+00:00 Expected per process system memlock (soft) limit to lock SHARED GLOBAL AREA (SGA) into memory: 1540M 2018-08-20T12:43:18.163952+00:00 Available system pagesizes: 4K, 2048K 2018-08-20T12:43:18.164143+00:00 Supported system pagesize(s): 2018-08-20T12:43:18.164220+00:00 PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s) 2018-08-20T12:43:18.164382+00:00 2048K 1056 770 770 NONE [oracle@HPtesting ~]$ grep ^HugePages /proc/meminfo HugePages_Total: 1056 HugePages_Free: 287 HugePages_Rsvd: 1 HugePages_Surp: 0
  • 24. © The Pythian Group Inc., 2018 24 LargePages (A.K.A. HugePages in Windows) ● Available since Oracle 10.1 ● Enabled by adding an entry into the registry, ideally for each SID instead of general ● Again only used for SGA ● Not considered in the "Working Set" so memory usage metrics are now somehow flawed ● Startup times may be slow and with high impact on the server performance for older versions ● Oriented to DWH type databases
  • 25. 25© The Pythian Group Inc., 2018 Case Studies Lack of HugePages causing trouble
  • 26. 26© The Pythian Group Inc., 2018 RAC node eviction ● 1 node of 2-node cluster evicted ● Logs show a timeout responding to something prior to eviction ● We found no other errors or evidence ● sar to the rescue!
  • 27. 27© The Pythian Group Inc., 2018 RAC node eviction – “sar -r” 05:20:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty 05:30:01 AM 361136 65389512 99.45 932 24477352 41041528 29.48 29509228 1880236 564 05:40:01 AM 354896 65395752 99.46 95164 24434320 41039432 29.48 29504552 1902320 552 05:50:01 AM 382940 65367708 99.42 87912 24420284 41021636 29.47 29474908 1902904 496 06:00:01 AM 385016 65365632 99.41 52432 24414712 41053708 29.49 29477412 1878860 488 06:10:01 AM 386796 65363852 99.41 596 24416944 41046880 29.48 29412032 1909420 628 06:20:02 AM 376484 65374164 99.43 596 24546212 41069336 29.50 29603108 2107020 460 06:30:01 AM 335176 65415472 99.49 596 24893684 41094396 29.52 29676840 2078424 648 06:40:05 AM 334152 65416496 99.49 596 24554064 41222332 29.61 29453168 2061660 0 06:50:03 AM 349392 65401256 99.47 596 22963852 41360864 29.71 28031816 1851900 72 07:00:10 AM 342752 65407896 99.48 596 21190320 41768848 30.00 26854676 1723480 0 07:10:04 AM 341756 65408892 99.48 596 20787592 41769828 30.00 26706944 1765980 12 Average: 414530 65336118 99.37 19907 24589646 41094908 29.52 29439910 1903200 2315 07:16:28 AM LINUX RESTART
  • 28. 28© The Pythian Group Inc., 2018 RAC node eviction – “sar -B” 05:20:01 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff 05:30:01 AM 7257.49 91.49 8704.61 1.28 6571.59 44.62 0.00 30.63 68.65 05:40:01 AM 2021.61 2486.25 141607.78 5.77 60734.72 451.59 38.26 386.08 78.82 05:50:01 AM 6980.26 71.57 7241.62 0.56 6380.14 35.86 7.72 38.22 87.71 06:00:01 AM 7262.73 67.56 8717.63 1.10 6549.42 47.03 1.18 40.58 84.18 06:10:01 AM 1759.35 379.66 15556.00 4.59 7320.75 185.60 2.75 143.16 76.01 06:20:02 AM 63309.67 3624.60 34222.39 267.30 50019.66 42307.14 982.44 13754.06 31.77 06:30:01 AM 115962.81 2730.86 30665.11 373.74 86055.51 843180.51 16021.66 26924.73 3.13 06:40:05 AM 83609.10 1331.45 20393.23 235.71 62484.15 1104330.76 25458.10 20843.61 1.84 06:50:03 AM 158193.69 4252.68 27395.53 375.73 111261.98 1848753.28 61689.42 38619.69 2.02 07:00:10 AM 98699.51 4257.23 23771.15 292.99 70708.84 590100.80 12354.06 23573.29 3.91 07:10:04 AM 125777.83 2409.66 23413.33 415.65 91748.06 952301.48 24672.30 31401.32 3.21 Average: 20671.81 1108.76 15287.47 51.49 18890.14 126225.66 3483.02 4065.45 3.13 07:16:28 AM LINUX RESTART
  • 29. 29© The Pythian Group Inc., 2018 RAC node eviction - “cat /proc/meminfo” (after incident) $ cat /proc/meminfo MemTotal: 65918584 kB MemFree: 1583912 kB MemAvailable: 20034320 kB Buffers: 416208 kB Cached: 41349928 kB SwapTotal: 73469948 kB SwapFree: 73334068 kB KernelStack: 23392 kB PageTables: 12495120 kB AnonHugePages: 1478656 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB
  • 30. 30© The Pythian Group Inc., 2018 Unexpected Swapping ● Lots of notes about swapping in alert log ● Small (2GB SGA) ● Rarely used database ● vm.swappiness was not reviewed, probably at 60 WARNING: Heavy swapping observed on system in last 5 mins. pct of memory swapped in [0.27%] pct of memory swapped out [2.22%]. Please make sure there is no memory pressure and the SGA and PGA are configured correctly. Look at DBRM trace file for more details.
  • 31. © The Pythian Group Inc., 2018 31 ● Once a month load > 400 ● System unusable but no crash CPU stealing
  • 32. 32© The Pythian Group Inc., 2018 Yet again - sar is the star [oracle@oramstr01 oracle]$ sar -f sa07 -s 14:30:00 -e 18:30:00 -u Linux 2.6.32-573.8.1.el6.x86_64 (oramstr01.testing.com) 12/07/2016 _x86_64_ (80 CPU) 02:30:01 PM CPU %user %nice %system %iowait %steal %idle 02:40:01 PM all 38.31 0.00 3.35 3.02 0.00 55.32 02:50:01 PM all 34.02 0.00 3.15 2.63 0.00 60.19 03:00:01 PM all 34.20 0.00 3.20 1.68 0.00 60.92 03:10:01 PM all 40.79 0.00 3.81 2.96 0.00 52.44 03:20:01 PM all 37.33 0.00 3.43 2.40 0.00 56.83 03:30:04 PM all 40.72 0.00 6.12 2.62 0.00 50.54 03:40:01 PM all 10.08 0.00 88.36 0.30 0.00 1.26 03:50:02 PM all 8.66 0.00 90.82 0.05 0.00 0.47 04:00:03 PM all 31.66 0.00 68.27 0.02 0.00 0.04 04:10:03 PM all 45.84 0.00 49.19 0.90 0.00 4.07 04:20:01 PM all 40.68 0.00 54.30 0.97 0.00 4.05 04:30:01 PM all 37.81 0.00 43.22 1.13 0.00 17.84 04:40:02 PM all 15.68 0.00 84.18 0.05 0.00 0.09 04:50:02 PM all 12.76 0.00 87.23 0.00 0.00 0.01 05:00:03 PM all 11.84 0.00 88.14 0.00 0.00 0.01 05:10:01 PM all 18.56 0.00 62.74 0.71 0.00 17.99 05:20:01 PM all 15.84 0.00 1.73 1.17 0.00 81.27 05:30:01 PM all 19.22 0.00 1.75 0.71 0.00 78.33 05:40:01 PM all 25.51 0.00 2.02 1.15 0.00 71.32 05:50:01 PM all 23.78 0.00 1.85 1.05 0.00 73.32 06:00:01 PM all 20.15 0.00 1.66 0.88 0.00 77.30 06:10:01 PM all 21.14 0.00 2.68 2.93 0.00 73.25 06:20:01 PM all 18.94 0.00 2.33 2.26 0.00 76.46 Average: all 26.23 0.00 32.81 1.29 0.00 39.67
  • 33. © The Pythian Group Inc., 2018 33 [oracle@oramstr01 ~]$ date Wed Dec 14 10:23:22 EST 2016 [oracle@oramstr01 ~]$ ps -ef | grep -c oracleccxp 2717 [oracle@oramstr01 ~]$ grep PageTable /proc/meminfo PageTables: 461002976 kB Yes, that is 440 GiBs of PageTables! Sessions and pagetable memory
  • 34. 34© The Pythian Group Inc., 2018 Yet again - sar is the star [oracle@oramstr01 oracle]$ sar -r -f sa07 -s 14:30:00 -e 17:30:00 Linux 2.6.32-573.8.1.el6.x86_64 (oramstr01.testing.com) 12/07/2016 _x86_64_ (80 CPU) 02:30:01 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit 02:40:01 PM 57063440 1001654588 94.61 1610536 412910108 311506624 26.72 02:50:01 PM 57747440 1000970588 94.55 1610564 412911112 306685964 26.31 03:00:01 PM 46592180 1012125848 95.60 1610596 412914608 308191992 26.44 03:10:01 PM 31680840 1027037188 97.01 1610608 412917688 310993660 26.68 03:20:01 PM 16457972 1042260056 98.45 1610628 412920472 309580976 26.56 03:30:04 PM 1739692 1056978336 99.84 1610628 411393436 317613764 27.25 03:40:01 PM 5066928 1053651100 99.52 1538352 395198928 324298580 27.82 03:50:02 PM 28196104 1030521924 97.34 1342292 381568100 324394208 27.83 04:00:03 PM 11313156 1047404872 98.93 1359396 378901468 326693864 28.03 04:10:03 PM 80061500 978656528 92.44 1359488 375162128 321167148 27.55 04:20:01 PM 64494004 994224024 93.91 1359508 375163768 322061964 27.63 04:30:01 PM 108230896 950487132 89.78 1359532 375166776 313685004 26.91 04:40:02 PM 135833716 922884312 87.17 1359548 375168248 318691876 27.34 04:50:02 PM 192323488 866394540 81.83 1359556 375169736 315572568 27.07 05:00:03 PM 235108136 823609892 77.79 1359648 375172216 312304460 26.79 05:10:01 PM 360281464 698436564 65.97 1359724 375173424 295083536 25.32 05:20:01 PM 357150032 701567996 66.27 1359748 375175952 296449248 25.43
  • 35. 35© The Pythian Group Inc., 2018 Summary ● HugePages are usually good to have ● How to implement ● Know where to look ● /proc/meminfo ■ HugePages ■ Pagetables ● Remember the power of sar/OSwBB ● Following best practices prevents issues
  • 36. © The Pythian Group Inc., 2018 36 References ● Oracle 11g internals part 1: Automatic Memory Management by Tanel Poder ● Oracle SGA memory allocation on startup by Fritz Hoogland ● Oracle Linux: Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration (Doc ID 401749.1) ● Oracle Exadata Initialization Parameters and Diskgroup Attributes Best Practices ( Doc ID 2062068.1) ● 12.2 Grid Infrastructure and Database Upgrade steps for Exadata Database Machine running 11.2.0.3 and later on Oracle Linux (Doc ID 2111010.1) ● ASM & Shared Pool (ORA-4031) (Doc ID 437924.1)
  • 37. Q&A Ask now or reach out later, but don't keep the question for yourself
  • 38. 38© The Pythian Group Inc., 2018 THANK YOU Hope you enjoyed it