SlideShare a Scribd company logo
4
Most read
11
Most read
15
Most read
Tech Talk Live
Alfresco Performance Tuning – Part 2
Speaker Bio
Luis Cabaceira – Principal Consultant at Alfresco
Agenda
1 – Jvm tuning
2 – Garbage collection analisis
2 – Caches
3 - Alfresco is running slow.. where to start ?
1 – JVM Tuning
• Tune the memory and garbage collection parameters for the JVM to be
appropriate for your situation. Enable GC logs and analyze them.
• Solr is more memory intensive than Alfresco
• Alfresco consumes memory on the repository L2 Cache, Alfresco system memory.
• Tuning will vary depending if you are running Alfresco and Solr on the same server
and same Jvm.
• General good settings for Alfresco (assuming a server with 16GB RAM)
-Xms8000m –Xmx12000m -XX:MaxPermSize=512m -Xss1024K
-XX:-DisableExplicitGC -XX:NewSize=2G -XX:+UseCodeCacheFlushing
-Dsun.security.ssl.allowUnsafeRenegotiation=true -Djava.awt.headless=true
• Extra Settings found on large Alfresco implementations (solr better with CMS)
-XX:+UseConcMarkSweepGC –server
-XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=80
-XX:+UseParNewGC -XX:ParallelGCThreads=4
-XX:+UseCompressedOops -XX:+CMSClassUnloadingEnabled
1 – Garbage Collector Tuning
A regular analysis to the garbage collection logs is also a known best practice
and the health of the Garbage collection engine is normally related with the
overall effectiveness of memory usage across the system. This is valid for
Alfresco, Solr and any possible client that is part of the deployment.
The best practice is to choose an analysis timeframe which is know to be the
period when the system is most heavily used and monitor the garbage
collection operations that happened during that period.
There are some available tools to analyze garbage collection logs, but the one I
think generates a more accurate report is Censum from Jclarity. It’s possible to
download a trial version of this tool as use it to analyze the GC logs during 7
days.
You can also GCViewer, its opensource and a very useful tool.
https://github.com/chewiebug/GCViewer
1 – Garbage Collector common problems
1 - Look for periodic calls to system.gc(); - add the -XX:+DisableExplicitGC flag
2 - Look for high pauses
High pauses from garbage collection can be an indication of a number of problems. A High
Percentage Of Time spent paused in GC may mean that the heap has been under-sized, causing
frequent full GC activity. A high Longest Pause in Seconds may be an indication that the heap is too
large, causing long individual garbage collections.
3 - Look for premature promotion of objects
Premature promotion is a condition that occurs when objects that should be collected in a young
generation pool (Eden or The Survivor "From" space) are instead promoted to Tenured (Old) space.
A consequence of premature promotion is that this places additional pressure on Tenured space,
which will result in more frequent collections of Tenured. More frequent collections in Tenured
collector will interfere with your application's performance.Look for premature
2 – Caches (ehcache/hazzlecast)
• Alfresco now uses hazelecast clustering and caching
• Database is now used for cluster discovery
• Removing a node from the cluster is now configured on the alfresco-global.properties
• alfresco.cluster.enabled=false
• The repository caches are separated in 2 different levels:
• L1 = The transactional cache (TransactionalCache.java)
• L2 = Hazelcast distributed Cache (>4.2.X)
• The level 1 cache commits to L2 cache.
• Tracing cache usage is very important for tuning
• Adding the following options to your JVM will expose the jmx features of hazelcast.
• -Dhazelcast.jmx=true -Dhazelcast.jmx.detailed=true
2 – Caches (ehcache/hazzlecast)
• In Alfresco, hazelcast works with factories that allow the creation of caches
• You can define your own caches
2 – Hazzlecast cache mechanisms
With Hazelcast the cache is distributed across the clustering members, doing a more
linear distribution of the memory usage. In the alfresco implementation you have more
mechanisms available to define different cache cluster types.
Fully Distributed
This is the normal value for a hazelcast cache. Cache values (key value pairs) will be
evenly distributed across cluster members. Leads to more remote lookups when a get
request is issued and that value is present in other node (remote).
Local cache
Some caches you may not actually want them to be clustered at all (or distributed), so
this option works as a unclustered cache.
Invalidating
This is a local (cluster aware) cache that sets up a messenger that sends invalidation
messages messages to the remaining cluster nodes if you updated an item in the cache,
much similar as the old eh-cache mechanisms.
2 – Tuning hazelcast
To perform a cache tuning exercise we need to analyze 3 relevant factors :
- type of data
- how often it changes
- number of gets compared to the number of writes
If we can identify caches that the correspondent values do not change often, its worth to try
and set them to invalidating, and check the performance results.
Note that in distributed-caches, when we have a lot a remote gets, if the objects that are being
stored are big, the remote get operation its going to be slow. This is mainly because the object is
serialized and it needs to be un-serialized before its content is made available and that
operation can take some time depending on the size of the object.
2 – Tuning hazelcast
Caches values can be configured/overridden on alfresco-global.properties
• cache.aclSharedCache.tx.maxItems=40000
• cache.aclSharedCache.maxItems=100000
• cache.aclSharedCache.timeToLiveSeconds=0
• cache.aclSharedCache.maxIdleSeconds=0
• cache.aclSharedCache.cluster.type=fully-distributed
• cache.aclSharedCache.backup-count=1
• cache.aclSharedCache.eviction-policy=LRU
• cache.aclSharedCache.eviction-percentage=25
• cache.aclSharedCache.merge-policy=hz.ADD_NEW_ENTRY
Look for : WARN [cache.node.nodesTransactionalCache] Transactional update cache
‘org.alfresco.cache.node.nodesTransactionalCache’ is full (125000).
3 – Alfresco is running Slow (Where to Start)
• First we need to identify what/where is alfresco running Slow
• Is it Alfresco that is slow ?
• Page Rendering ?
• Dashboard takes a long time to render ?
• Login takes long ?
• Browsing the Repository is very slow (Permission evaluation ? )
• Uploading content performance (bulk import, migration, rules)
• Search is slow
• Workflow problems
• Cpu is 100%
• Memory is exhausted
• Cluster communication problem ?
3 – Alfresco is running Slow (Where to Start)
• Investigating, “Follow the Request”
• Is apache or a physical load balancer being used in front on Alfresco ?
• Are there enough connections/threads/workers available for the existing load.
• Any timeouts on the apache/lb logs ?
• Check overall performance of apache.
• What are the tomcat threads doing
• Use support tools, check real time thread dumps, see behaviors/actions.
• Run a series of jstack commands and check what the threads are doing
• What is consuming the memory
• Extract heapdumps and jstacks and check what is occupying memory
3 – Alfresco is running Slow (Where to Start)
• “Questioning the DB” – Key performance indicators
• Response time
• Blocked queries
• Top queries by frequency and / or time
• Slow Queries
• Average number of Transactions per second (during a peak period)
• Number of Connections (during a peak period)
• Database server health (Cpu, memory, IO, Network)
• Indexes Size and Health
• Inspect JDBC access to the database
• Jdbcspy
• Log4jdbc
• Javamelody
3 – Alfresco is running Slow (Where to Start)
• “Questioning the Storage” – Key performance indicators
• I/O performance (iometer, hdparm)
• Check both
• Alfresco Content Store storage
• Solr Indexes Storage(should be faster)
• Run EVT, last test will check the speed of the indexes disk storage and produce a
meaningful report.
• Checking Indexes disk free space (merging processes require at least 40% free)
3 – Alfresco is running Slow (Where to Start)
• “Questioning Alfresco” – Key performance indicators
• Cpu Usage, Memory Usage, Threads
• Check alfresco.log for ERRORS, WARN
• Can use elasticSearch to aggregate all relevant logs and do a common search
• Enable Transformations log, check for transformation ERRORS
• Verify transformation limits
• Enable GC logs on the alfresco JVM and analise GC performance
• Verify Content policies, rules, scheduled tasks, integrations and customizations
• Analyze use case, and identify the log classes can can produce relevant information
while in DEBUG mode. Use support tools for real time troubleshooting.
3 – Alfresco is running Slow (Where to Start)
• “Questioning Solr” – Key performance indicators
• Cpu Usage, Memory Usage, Threads
• Check solr.log for ERRORS, WARN
• Can use elasticSearch to aggregate all relevant logs and do a common search
• Enable Query logs, check for ERRORS
• Verify Solr statistics / cache usages
• Enable GC logs on the solr JVM and analyze GC performance
• Verify merging problems, slow disks, insufficient free space, configuration problem.
• Analyze search use case, blacklist some mime-types, keep your index small, only
index what you will search for.

More Related Content

What's hot (20)

PDF
Cosco: An Efficient Facebook-Scale Shuffle Service
Databricks
 
PDF
Guide to alfresco monitoring
Miguel Rodriguez
 
PPTX
From zero to hero Backing up alfresco
Toni de la Fuente
 
PDF
Alfresco security best practices CHECK LIST ONLY
Toni de la Fuente
 
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
PPTX
Kafka Security
DataWorks Summit/Hadoop Summit
 
PPTX
Scale your Alfresco Solutions
Alfresco Software
 
PPTX
Elk
Caleb Wang
 
PPTX
Moving Gigantic Files Into and Out of the Alfresco Repository
Jeff Potts
 
PDF
Storage and Alfresco
Toni de la Fuente
 
PDF
Understanding PostgreSQL LW Locks
Jignesh Shah
 
PPTX
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
PPTX
Alfresco Certificates
Angel Borroy López
 
ODP
Graylog
Diwakar Upadhyay
 
PDF
Apache kafka-a distributed streaming platform
confluent
 
PDF
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks
 
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
PPTX
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
PDF
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
PPTX
Apache Spark Fundamentals
Zahra Eskandari
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Databricks
 
Guide to alfresco monitoring
Miguel Rodriguez
 
From zero to hero Backing up alfresco
Toni de la Fuente
 
Alfresco security best practices CHECK LIST ONLY
Toni de la Fuente
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Scale your Alfresco Solutions
Alfresco Software
 
Moving Gigantic Files Into and Out of the Alfresco Repository
Jeff Potts
 
Storage and Alfresco
Toni de la Fuente
 
Understanding PostgreSQL LW Locks
Jignesh Shah
 
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
Alfresco Certificates
Angel Borroy López
 
Apache kafka-a distributed streaming platform
confluent
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks
 
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Apache Spark Fundamentals
Zahra Eskandari
 

Viewers also liked (19)

PPTX
Sizing your alfresco platform
Luis Cabaceira
 
PDF
Alfresco Security Best Practices Guide
Toni de la Fuente
 
PDF
20130925 alfresco study18performancetuning
Takeshi Totani
 
PDF
Alfresco Security Best Practices 2014
Toni de la Fuente
 
PPTX
10 things you need to know to deliver a successful Alfresco project
Symphony Software Foundation
 
PDF
Alfresco 4: Scalability and Performance
Alfresco Software
 
PDF
Alfresco Day Roma 2015: Big Repository
Alfresco Software
 
PPT
Monitoring Alfresco with Nagios/Icinga
Toni de la Fuente
 
KEY
Alfrescoクラスタリング入門
Ashitaba YOSHIOKA
 
PPTX
Alfresco Tech Talk Live - REST API of the Future
Gavin Cornwell
 
PPTX
Tech Talk Live - 5.2 REST APIs
Gavin Cornwell
 
PDF
Alfresco Day Roma 2015: Alfresco Activiti
Alfresco Software
 
KEY
Alfresco勉強会20120829: やさしいShareダッシュレットの作り方
linzhixing
 
PDF
Actions rules and workflow in alfresco
Alfresco Software
 
PPT
Alfresco content model
Muralidharan Deenathayalan
 
PDF
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
J V
 
PPTX
Alfresco Tech Talk Live #92 - Model Management
Mike Farman
 
PPTX
Alfresco 4.0 - A Complete Introduction
Algoworks Inc
 
PDF
Alfresco Day Amsterdam 2015 - Alfresco One Product Suite Update + Demo
Alfresco Software
 
Sizing your alfresco platform
Luis Cabaceira
 
Alfresco Security Best Practices Guide
Toni de la Fuente
 
20130925 alfresco study18performancetuning
Takeshi Totani
 
Alfresco Security Best Practices 2014
Toni de la Fuente
 
10 things you need to know to deliver a successful Alfresco project
Symphony Software Foundation
 
Alfresco 4: Scalability and Performance
Alfresco Software
 
Alfresco Day Roma 2015: Big Repository
Alfresco Software
 
Monitoring Alfresco with Nagios/Icinga
Toni de la Fuente
 
Alfrescoクラスタリング入門
Ashitaba YOSHIOKA
 
Alfresco Tech Talk Live - REST API of the Future
Gavin Cornwell
 
Tech Talk Live - 5.2 REST APIs
Gavin Cornwell
 
Alfresco Day Roma 2015: Alfresco Activiti
Alfresco Software
 
Alfresco勉強会20120829: やさしいShareダッシュレットの作り方
linzhixing
 
Actions rules and workflow in alfresco
Alfresco Software
 
Alfresco content model
Muralidharan Deenathayalan
 
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...
J V
 
Alfresco Tech Talk Live #92 - Model Management
Mike Farman
 
Alfresco 4.0 - A Complete Introduction
Algoworks Inc
 
Alfresco Day Amsterdam 2015 - Alfresco One Product Suite Update + Demo
Alfresco Software
 
Ad

Similar to Alfresco tuning part2 (20)

PDF
JCON World 2023 - Cache, but Cache Wisely.pdf
DevenPhillips
 
PPT
Free vs Paid. Alfresco Labs vs Enterprise: 10 questions.
Alfresco Software
 
PPTX
Foundation APIs and Repository Internals
Alfresco Software
 
PPTX
Alfresco 5.0 Technology Review
Zia Consulting
 
PPT
Alfresco Web Content Management Roadmap - 3.2 and Beyond
Alfresco Software
 
PPTX
Alfresco Support tools
Antonio Soler
 
PDF
alfresco-global.properties-COMPLETO-3.4.6
alfrescosedemo
 
PPT
Empowering Next Generation Media
Ricardo Piccoli
 
PPTX
10 Tips Every New Developer in Alfresco Should Know
Angel Borroy López
 
PDF
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Michael Plöd
 
PDF
Tools and Tips to Diagnose Performance Issues
Claudio Miranda
 
PDF
BP-1 Performance and Scalability
Alfresco Software
 
PPTX
Alfresco Summit 2013 - The Art of the Upgrade
Kyle Adams
 
PDF
10 Tips to Pump Up Your Atlassian Performance
Atlassian
 
PDF
Scaling Your Cache
Alex Miller
 
ODP
Low level java programming
Peter Lawrey
 
PPTX
Intro to Alfresco for Developers
Jeff Potts
 
PDF
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
Lucidworks
 
PDF
BP-6 Repository Customization Best Practices
Alfresco Software
 
PDF
Ehcache Architecture, Features And Usage Patterns
Eduardo Pelegri-Llopart
 
JCON World 2023 - Cache, but Cache Wisely.pdf
DevenPhillips
 
Free vs Paid. Alfresco Labs vs Enterprise: 10 questions.
Alfresco Software
 
Foundation APIs and Repository Internals
Alfresco Software
 
Alfresco 5.0 Technology Review
Zia Consulting
 
Alfresco Web Content Management Roadmap - 3.2 and Beyond
Alfresco Software
 
Alfresco Support tools
Antonio Soler
 
alfresco-global.properties-COMPLETO-3.4.6
alfrescosedemo
 
Empowering Next Generation Media
Ricardo Piccoli
 
10 Tips Every New Developer in Alfresco Should Know
Angel Borroy López
 
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Michael Plöd
 
Tools and Tips to Diagnose Performance Issues
Claudio Miranda
 
BP-1 Performance and Scalability
Alfresco Software
 
Alfresco Summit 2013 - The Art of the Upgrade
Kyle Adams
 
10 Tips to Pump Up Your Atlassian Performance
Atlassian
 
Scaling Your Cache
Alex Miller
 
Low level java programming
Peter Lawrey
 
Intro to Alfresco for Developers
Jeff Potts
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
Lucidworks
 
BP-6 Repository Customization Best Practices
Alfresco Software
 
Ehcache Architecture, Features And Usage Patterns
Eduardo Pelegri-Llopart
 
Ad

Recently uploaded (20)

PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
July Patch Tuesday
Ivanti
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 

Alfresco tuning part2

  • 1. Tech Talk Live Alfresco Performance Tuning – Part 2
  • 2. Speaker Bio Luis Cabaceira – Principal Consultant at Alfresco
  • 3. Agenda 1 – Jvm tuning 2 – Garbage collection analisis 2 – Caches 3 - Alfresco is running slow.. where to start ?
  • 4. 1 – JVM Tuning • Tune the memory and garbage collection parameters for the JVM to be appropriate for your situation. Enable GC logs and analyze them. • Solr is more memory intensive than Alfresco • Alfresco consumes memory on the repository L2 Cache, Alfresco system memory. • Tuning will vary depending if you are running Alfresco and Solr on the same server and same Jvm. • General good settings for Alfresco (assuming a server with 16GB RAM) -Xms8000m –Xmx12000m -XX:MaxPermSize=512m -Xss1024K -XX:-DisableExplicitGC -XX:NewSize=2G -XX:+UseCodeCacheFlushing -Dsun.security.ssl.allowUnsafeRenegotiation=true -Djava.awt.headless=true • Extra Settings found on large Alfresco implementations (solr better with CMS) -XX:+UseConcMarkSweepGC –server -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+UseCompressedOops -XX:+CMSClassUnloadingEnabled
  • 5. 1 – Garbage Collector Tuning A regular analysis to the garbage collection logs is also a known best practice and the health of the Garbage collection engine is normally related with the overall effectiveness of memory usage across the system. This is valid for Alfresco, Solr and any possible client that is part of the deployment. The best practice is to choose an analysis timeframe which is know to be the period when the system is most heavily used and monitor the garbage collection operations that happened during that period. There are some available tools to analyze garbage collection logs, but the one I think generates a more accurate report is Censum from Jclarity. It’s possible to download a trial version of this tool as use it to analyze the GC logs during 7 days. You can also GCViewer, its opensource and a very useful tool. https://github.com/chewiebug/GCViewer
  • 6. 1 – Garbage Collector common problems 1 - Look for periodic calls to system.gc(); - add the -XX:+DisableExplicitGC flag 2 - Look for high pauses High pauses from garbage collection can be an indication of a number of problems. A High Percentage Of Time spent paused in GC may mean that the heap has been under-sized, causing frequent full GC activity. A high Longest Pause in Seconds may be an indication that the heap is too large, causing long individual garbage collections. 3 - Look for premature promotion of objects Premature promotion is a condition that occurs when objects that should be collected in a young generation pool (Eden or The Survivor "From" space) are instead promoted to Tenured (Old) space. A consequence of premature promotion is that this places additional pressure on Tenured space, which will result in more frequent collections of Tenured. More frequent collections in Tenured collector will interfere with your application's performance.Look for premature
  • 7. 2 – Caches (ehcache/hazzlecast) • Alfresco now uses hazelecast clustering and caching • Database is now used for cluster discovery • Removing a node from the cluster is now configured on the alfresco-global.properties • alfresco.cluster.enabled=false • The repository caches are separated in 2 different levels: • L1 = The transactional cache (TransactionalCache.java) • L2 = Hazelcast distributed Cache (>4.2.X) • The level 1 cache commits to L2 cache. • Tracing cache usage is very important for tuning • Adding the following options to your JVM will expose the jmx features of hazelcast. • -Dhazelcast.jmx=true -Dhazelcast.jmx.detailed=true
  • 8. 2 – Caches (ehcache/hazzlecast) • In Alfresco, hazelcast works with factories that allow the creation of caches • You can define your own caches
  • 9. 2 – Hazzlecast cache mechanisms With Hazelcast the cache is distributed across the clustering members, doing a more linear distribution of the memory usage. In the alfresco implementation you have more mechanisms available to define different cache cluster types. Fully Distributed This is the normal value for a hazelcast cache. Cache values (key value pairs) will be evenly distributed across cluster members. Leads to more remote lookups when a get request is issued and that value is present in other node (remote). Local cache Some caches you may not actually want them to be clustered at all (or distributed), so this option works as a unclustered cache. Invalidating This is a local (cluster aware) cache that sets up a messenger that sends invalidation messages messages to the remaining cluster nodes if you updated an item in the cache, much similar as the old eh-cache mechanisms.
  • 10. 2 – Tuning hazelcast To perform a cache tuning exercise we need to analyze 3 relevant factors : - type of data - how often it changes - number of gets compared to the number of writes If we can identify caches that the correspondent values do not change often, its worth to try and set them to invalidating, and check the performance results. Note that in distributed-caches, when we have a lot a remote gets, if the objects that are being stored are big, the remote get operation its going to be slow. This is mainly because the object is serialized and it needs to be un-serialized before its content is made available and that operation can take some time depending on the size of the object.
  • 11. 2 – Tuning hazelcast Caches values can be configured/overridden on alfresco-global.properties • cache.aclSharedCache.tx.maxItems=40000 • cache.aclSharedCache.maxItems=100000 • cache.aclSharedCache.timeToLiveSeconds=0 • cache.aclSharedCache.maxIdleSeconds=0 • cache.aclSharedCache.cluster.type=fully-distributed • cache.aclSharedCache.backup-count=1 • cache.aclSharedCache.eviction-policy=LRU • cache.aclSharedCache.eviction-percentage=25 • cache.aclSharedCache.merge-policy=hz.ADD_NEW_ENTRY Look for : WARN [cache.node.nodesTransactionalCache] Transactional update cache ‘org.alfresco.cache.node.nodesTransactionalCache’ is full (125000).
  • 12. 3 – Alfresco is running Slow (Where to Start) • First we need to identify what/where is alfresco running Slow • Is it Alfresco that is slow ? • Page Rendering ? • Dashboard takes a long time to render ? • Login takes long ? • Browsing the Repository is very slow (Permission evaluation ? ) • Uploading content performance (bulk import, migration, rules) • Search is slow • Workflow problems • Cpu is 100% • Memory is exhausted • Cluster communication problem ?
  • 13. 3 – Alfresco is running Slow (Where to Start) • Investigating, “Follow the Request” • Is apache or a physical load balancer being used in front on Alfresco ? • Are there enough connections/threads/workers available for the existing load. • Any timeouts on the apache/lb logs ? • Check overall performance of apache. • What are the tomcat threads doing • Use support tools, check real time thread dumps, see behaviors/actions. • Run a series of jstack commands and check what the threads are doing • What is consuming the memory • Extract heapdumps and jstacks and check what is occupying memory
  • 14. 3 – Alfresco is running Slow (Where to Start) • “Questioning the DB” – Key performance indicators • Response time • Blocked queries • Top queries by frequency and / or time • Slow Queries • Average number of Transactions per second (during a peak period) • Number of Connections (during a peak period) • Database server health (Cpu, memory, IO, Network) • Indexes Size and Health • Inspect JDBC access to the database • Jdbcspy • Log4jdbc • Javamelody
  • 15. 3 – Alfresco is running Slow (Where to Start) • “Questioning the Storage” – Key performance indicators • I/O performance (iometer, hdparm) • Check both • Alfresco Content Store storage • Solr Indexes Storage(should be faster) • Run EVT, last test will check the speed of the indexes disk storage and produce a meaningful report. • Checking Indexes disk free space (merging processes require at least 40% free)
  • 16. 3 – Alfresco is running Slow (Where to Start) • “Questioning Alfresco” – Key performance indicators • Cpu Usage, Memory Usage, Threads • Check alfresco.log for ERRORS, WARN • Can use elasticSearch to aggregate all relevant logs and do a common search • Enable Transformations log, check for transformation ERRORS • Verify transformation limits • Enable GC logs on the alfresco JVM and analise GC performance • Verify Content policies, rules, scheduled tasks, integrations and customizations • Analyze use case, and identify the log classes can can produce relevant information while in DEBUG mode. Use support tools for real time troubleshooting.
  • 17. 3 – Alfresco is running Slow (Where to Start) • “Questioning Solr” – Key performance indicators • Cpu Usage, Memory Usage, Threads • Check solr.log for ERRORS, WARN • Can use elasticSearch to aggregate all relevant logs and do a common search • Enable Query logs, check for ERRORS • Verify Solr statistics / cache usages • Enable GC logs on the solr JVM and analyze GC performance • Verify merging problems, slow disks, insufficient free space, configuration problem. • Analyze search use case, blacklist some mime-types, keep your index small, only index what you will search for.

Editor's Notes

  • #5: Tune the memory and garbage collection parameters for the JVM to be appropriate for your situation. Tuning and deeper inspection of the Java Virtual Machine is a very important activity for java-based applications. In an enterprise-class installation of Alfresco where often one or multiple clustered instances are running, huge amounts of data are stored with a lot of user access that cause often a http overload. Alfresco can be optimized by a careful and meticulous tuning of the hardware resources, specifically cpu and ram used by the JVM.
  • #6: There are some available tools to analyze garbage collection logs, but the one I think generates a more accurate report is Censum from Jclarity. It’s possible to download a trial version of this tool as use it to analyze the GC logs during 7 days. Censum is a nice tool that takes log files from the complex Java™ (JVM) garbage collection sub-system and gives you meaningful answers by providing clear infographics and making recommendations based on the analysis results. You can also use a completely free tool like GCViewer, it does not provide so the same recommendations but it’s still a very useful tool. https://github.com/chewiebug/GCViewer
  • #7: 1 ) Periodic calls to trigger a Full GC either via System.gc() or Runtime.gc() preempt the natural flow of GC and corrupt the numerous metrics that are used to help the collectors run as optimally as is possible. There are many possible sources for these calls, which may include: An RMI call from a remote JVM. A Timer, TimerTask or ScheduleExecutor thread. A cron or Quartz job. Some external trigger causing the Memory MXBean to trigger a Full GC. add the -XX:+DisableExplicitGC flag 3) There are a number of possible causes for this problem: The Eden and/or Survivor spaces may be too small. The -XX:MaxTenuringThreshold flag may have been set too low.   There are number of possible solutions for this problem:   Alter the size of the young space via the -XX:NewRatio property. Alter the size of Survivor Spaces (relative to Eden) via the -XX:SurvivorRatio=<N> property using the information provided by the Tenuring graphs. This flag works to divide Young into N + 2 chunks. N chunks will be assigned to Eden and 1 chunk each will be assigned to the To and From spaces respectively. Alter the -XX:MaxTenuringThreshold property.   Note: enlarging Survivor spaces takes will result in less space being assigned to Eden. The size of Eden times your allocation rate yields the frequency of collections in young generation. Be sure to increase the size of young so that Eden stays the same size in order to avoid increasing the number of young generational collections.
  • #8: Tracing Hazelcast usage Transactional Caches (Level 1) Alfresco version 5.0 introduced a way to trace the transactional cache usage (much similar to the previous ehcache tracing mechanism). Unfortunately that tracing is not available in version 4.2.X, one way to get this feature would be to open a support ticket requesting a back-port of this feature. HazelCast caches (Level 2) Using hazelcast mancenter you can trace the L2 cache usage, for more information check http://docs.alfresco.com/4.2/tasks/hazelcast-setup.html
  • #9: Cache-Factory - Allows for the creation of caches Messenger-Factory - Abstraction over the hazelcast topic (published subscribe messaging system) LockStore Factory - Where in-memory locks are kept You can define your own caches as per the example below   <bean name=“contentDataSharedCache" factory-bean=“cacheFactory" factory-method="createCache"> <constructor-arg value="cache.customContDataCache"/> </bean>   cache.customContDataCache.maxItems=130000 cache.customContDataCache.timeToLiveSeconds=0 cache.customContDataCache.maxIdleSeconds=0 cache.customContDataCache.cluster.type=fully-distributed cache.customContDataCache.backup-count=1 cache.customContDataCache.eviction-policy=LRU cache.customContDataCache.eviction-percentage=25 cache.customContDataCache.merge-policy=hz.ADD_NEW_ENTRY   Note that there is no corresponding hazelcast-tcp.xml entry for the custom cache, the factory does all the configuration programmatically using the name of the cache customContDataCache as a root to discover the remaining configuration properties.
  • #10: Invalidating Can be useful to store something that its not serializable, because all the values stored in a hazelcast cache must be serializable. That is the way hazlecast send the information to the member its going to be stored on. If you got a cache where there is an enormous number of reads and very rare writes, using a invalidating cache can be the best approach, very likely to the way ehcache used to work. This was introduced also because there were some non serializable values in alfresco that could not reside on a fully-distributed cache. The way we define the caches (on our cache.properties file is as follows) cache.aclSharedCache.tx.maxItems=40000 cache.aclSharedCache.maxItems=100000 cache.aclSharedCache.timeToLiveSeconds=0 cache.aclSharedCache.maxIdleSeconds=0 cache.aclSharedCache.cluster.type=fully-distributed cache.aclSharedCache.backup-count=1 cache.aclSharedCache.eviction-policy=LRU cache.aclSharedCache.eviction-percentage=25 cache.aclSharedCache.merge-policy=hz.ADD_NEW_ENTRY   Note the notion of cache backups (backup-count=??), that guarantees that a distributed cache has a specific number of backups, in case of a node holding that bit of the cache dies, those caches are still accessible. The more backups you have, the more memory will be consumed.
  • #11: We also need to consider, that in distributed caches, when there are a lot of remote gets the network traffic will increase. On the other hand, if we choose and invalidation cache mechanism and the caches are changing often, the Invalidation messages can also be a single point of network stress. So overall it’s all about analyzing the trade-offs of each mechanism and to choose the more appropriate for each use case.
  • #12: LRU = Least Recently Used LFU = Least Frequently Used For debugging purposes, you can disable the L2 cache. The database will keep working, but at a slower rate.The Level 2 (L2) cache provides out-of-transaction caching of Java objects inside the Alfresco system. The L2 cache objects are stored in memory attached to the application scope of the server. Sticky sessions must be used to keep a user that has already established a session on one server for the entire session. By default, the cache replication makes use of RMI to replicate changes to all nodes in the cluster using the Peer Cache Replicator. Each replicated cache member notifies all other cache instances when its content has changed, in case of invalidating cache types. Fully distributes is a different approach introduced with hazelcast. If you have issues with the replication of information in clustered systems, that is, the cache cluster test fails, you may want to confirm this by setting the following properties to true in the alfresco-global.properties file as follows : system.cache.disableMutableSharedCaches=true system.cache.disableImmutableSharedCaches=true An important indicator that you need to increase your caches is when you see messages like the ones below on your alfresco.log file indicating that some specific caches are full. 13:25:12,901 WARN [cache.node.nodesTransactionalCache] Transactional update cache ‘org.alfresco.cache.node.nodesTransactionalCache’ is full (125000). 13:25:14,182 WARN [cache.node.aspectsTransactionalCache] Transactional update cache ‘org.alfresco.cache.node.aspectsTransactionalCache’ is full (65000). 13:25:14,214 WARN [cache.node.propertiesTransactionalCache] Transactional update cache ‘org.alfresco.cache.node.propertiesTransactionalCache’ is full (65000).
  • #13: Let’s follow the money  , In this case we “Follow the Request”
  • #14: jstack prints Java stack traces of Java threads for a given Java process or core file or a remote debug server. For each Java frame, the full class name, method name, 'bci' (byte code index) and line number, if available, are printed. Your repository can get easier maxed out if you are setting the application server to process more connections/threads than the machine can handle it. Consider first “bottlenecking”/reducing the number of application server connector’s threads in order each call gets CPU and memory resources needed to perform properly, so that your system gets a more stable response time over peak usages (scale the database connection pool maximum accordingly as mentioned before). You may need to scale up (or scale out your cluster) for handling a bigger number of calls during peak usages in place of just increasing the number of threads of one node application server.
  • #15: Alfresco (like all Java applications) uses JDBC to access the database, inspecting JDBC can yield useful information for developers, the tools below can be used to inspect JDBC.   jdbcspy Log4jdbc – more active development, easier to configure (IMHO) Regular maintenance and tuning of the Alfresco database is necessary. Specifically, all of the database servers that Alfresco supports require at the very least that some form of index statistics maintenance be performed at frequent, regular intervals to maintain optimal Alfresco performance.   Index maintenance can have a severe impact on Alfresco performance while in progress, hence it needs to be discussed with your project team and scheduled appropriately. Make sure your database is tuned for your usage patterns: high throughput, long running queries, decision support, mixed usage. Finally check that the specific supported database being used is configured properly and according to Alfresco documentation. In regards to latency in communication, the golden rule is that the response time from the DB in general, should be around 4ms or lower. If the amount of nodes and the expected concurrency increases, for example when the database grows to more than 20 million nodes and the number of concurrent users increase considerably we suggest adopting an active-active database cluster approach.  
  • #18: Understand JMX configuration and priority Alfresco allows for setting most of its configuration through JMX (or the Share admin console) in hot without need to restart and in a cluster aware mode. This is very handy for solving issues in hot and testing in general. But beware that those settings are stored in the database and have priority over the static settings (the file configurations on your server in general in alfresco-global.properties) will be active again when you restart the system.   This has a drawback, which is the fact that your static configuration may not reflect the real configuration state of your system what can make life complicated for system administrators in certain circumstances (especially newcomers). Keep your static configuration up to date You should consider as a best practice keeping the static files up to date with your configuration and also try to make sure they are as much as possible the ones being actively configuring your system. This would include in general reverting any configuration setup through JMX as soon as possible in order the system uses again the static configuration with whatever values you find appropriate.