Where the problem stems from is the choice of
Multi-Processing Module (MPM) chosen for the Apache installation and the default settings for that MPM.
On UNIX systems there are two main MPMs that are used. These are the
prefork MPM and the
worker MPM. The prefork MPM implements a multi process configuration where each process is single threaded. The worker MPM implements a multi process configuration but where each process is multi threaded.
Which MPM is used is a compile time option and not something that can be changed dynamically at run time. Thus, your decision has already been made by the time you have installed Apache from source code or from the binary operating system package. Often the choice is already made for you by what the operating system supplied as the default.
Traditionally the MPM used for an Apache installation has been prefork. This is because that is all the older Apache 1.3 supported, but also partly because modules for web development, such as PHP, were not generally multi thread safe and so required that prefork MPM be used.
With the MPM having being selected, either explicitly or through ignorance of there being a choice, that is where the majority of people stop. What most do not realise is that the default settings for an MPM will generally need to be modified based on what you are using Apache for and how much memory your system has available. Customising these settings is even more important for Python web applications as I will explain.
Lets first look at the default settings for the prefork MPM. The values for these as shipped with the original Apache source code is:
# prefork MPM
# StartServers: number of server processes to start
# MinSpareServers: minimum number of server processes which are kept spare
# MaxSpareServers: maximum number of server processes which are kept spare
# MaxClients: maximum number of server processes allowed to start
# MaxRequestsPerChild: maximum number of requests a server process serves
<IfModule mpm_prefork_module>
StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 150
MaxRequestsPerChild 0
</IfModule>
What this all means is that when Apache starts up it will create 5 child server processes for handling of requests. The number of child server processes used isn't a fixed number however. Instead, what will happen is that Apache will dynamically create additional child server processes when the load increases. Exactly when this occurs is dictated by the setting for the minimum number of idle spare servers. Such additional child server processes may be created up to a number determined by the maximum number of allowed clients. In this case, because each child server process is single threaded, that means a maximum of 150 child server processes may be created.
This is actually quite a lot of child server processes that can be created. If Apache is being used only to serve static files then this number is quite reasonable however. This is because each child server process should be quite small in size. Even when using PHP the server child processes shouldn't grow to be overly large. This is because PHP is CGI like in the sense that each application script is reconstructed on each request and then thrown away at the end of the request. Thus, nothing of the application persists between requests and thus any memory use is always transient.
The other key aspect of PHP which means that memory use of the individual child server processes is kept down, is that the extensions available to the PHP user is fixed when PHP is initialised. Further, all the PHP internal libraries and any optional extensions are preloaded from shared libraries/objects in the Apache parent process before any child server processes are even created from it. Thus, all the code which a PHP application uses is not only preloaded, but shared between all child server processes and isn't counting as private memory to the child server processes. This is significant when one considers that the PHP library alone, not counting optional extensions, can be about 7MB in size.
We now need to contrast what happens with PHP to what happens with mod_python and Python web applications.
When using mod_python the only thing that happens in the Apache parent process is that the Python interpreter is initialised. There is no preloading of any modules which a Python web application may want to use. This is the case as Python works the opposite way to PHP in that it does as little as possible up front, only importing specific modules when actually used by an application.
The next difference with Python web applications is that once application code is loaded it remains loaded for the life of the process. That is, unlike PHP which throws away the application between requests, everything persists between requests in Python web applications. If a Python web application spans a large set of URLs, the application code may not even all get loaded upon the initial request. Instead it may only get progressively loaded as different URLs are accessed.
The important thing to realise here is that all this loading of Python application code is occuring in the child server processes which handle the requests and not the Apache parent process. Except where Python modules are implemented as C extension modules, all the code that is loaded is going to use up private memory of the process. It is not unheard of for even small to moderate sized Python web applications to consume 30MB or more of private memory in each child server process.
It is this significant amount of memory per process which is where problems start to occur. If you remember, the default settings for the prefork MPM were such that up to 150 child server processes could be created. This means that for such a small to moderate sized Python web application, if Apache decided to create up to the maximum number of child server processes, you would need in excess of 4GB of memory.
If you are running a small VPS system with an allocation of only 256MB, you can see that it just isn't going to work very well. You might just squeak by with having all the initial 5 child server processes having loaded up your application, but as soon as you get a sudden increase in requests and Apache decides that it needs to create more child server processes, your system will quickly run out of memory.
So, although the default settings for the prefork MPM may be reasonable for handling of static file requests or PHP, they are going to be completely inapproriate for any sizable Python web application, especially if running a system with only limited memory.
This then addresses one of the main complaints one often sees made against mod_python. That is that it consumes huge amounts of memory. In reality it isn't mod_python at all here which is the problem.
First off the memory is being consumed by the Python web application and not mod_python. If you ran the same Python web application in a standalone process on top of a Python web server, that single instance of the application would still use about the same amount of memory.
The real problem here is that so many instances of the application have been allowed to be created by not changing the default settings for the prefork MPM. Specifically, the number defined for the maximum clients should be dropped comensurate with the amount of memory available to run it and how big the application gets.
A very crude measure for determining the maximum number of clients, and therefore how many child server processes will be created, is to divide the maximum amount of memory you want to allow the web server as a whole to use, by the amount of memory a single instance of the Python web application consumes.
Do note though that this is a very crude measure, things are in practice a bit more complicated than that. One thing that complicates the issue is whether keep alive is enabled for connections and what the keep alive timeout is set to.
Whether keep alive is enabled or not isn't going to change what the maximum number of clients should be set to, but it does in practice limit how many concurrent requests you will be able to effectively handle. This is because the Apache request handler threads will be busy waiting to see if a subsequent request is going to arrive over the same connection. Eventually the request handler thread will timeout, but during that time it will not be able to handle completely new requests.
If keep alive is a problem, one course often taken which can help out is to offload serving of static media files to a separate web server. Keep alive can then be turned off for the Apache instance running the Python web application where it generally isn't as beneficial as for static file requests. Web servers such as nginx and lighttpd are arguably better at serving static files anyway, and so you will actually get better performance when serving them that way. Offloading the static files also allows you to configure Apache properly for the specific Python web application being hosted, rather than having conflicting requirements.
As to the load spikes which can occur, what this comes down to is the startup costs of loading the Python web application being run. Here the problem is that Apache will create additional child server processes to meet demand. Because Python web applications these days generally have a lot of dependencies and need to load a lot of code they will not start up quickly. That startup is costly actually serves to multiply the severity of the problem, because although the additional processes are starting up, if they take too long, Apache will decide that it still doesn't have enough processes and will start creating more. In the worst case this can snowball until you have completely swamped your machine.
The solution here is not to create only a minimal number of servers when Apache starts, but create closer to what would be the maximum number of processes you would expect to require to handle the load. That way the processes always exist and are ready to handle requests and you will not end up in a situation where Apache needs to suddenly create a huge number of processes.
The catch here to watch out for is that the startup cost of the Python web application is simply transferred to when Apache is being started in the first place. If you find that even when a larger number of processes are created at startup, the initial burst of traffic and the subsequent loading of the actual Python web application strains the resources of your system, then you need to seriously look at whether you are creating many more processes than you need anyway.
First off, don't run PHP on the same web server. That way you can run worker MPM instead of prefork MPM. This immediately means you drop down drastically the number of processes you require because each process will then be multithreaded rather than single threaded and can handle many concurrent requests. To see how this works one can look at the default MPM settings for the worker MPM.
# worker MPM
# StartServers: initial number of server processes to start
# MaxClients: maximum number of simultaneous client connections
# MinSpareThreads: minimum number of worker threads which are kept spare
# MaxSpareThreads: maximum number of worker threads which are kept spare
# ThreadsPerChild: constant number of worker threads in each server process
# MaxRequestsPerChild: maximum number of requests a server process serves
<IfModule mpm_worker_module>
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
</IfModule>
The important thing to note here is that although the maximum number of clients is still 150, each process has 25 threads. Thus, the maximum number of processes that could be created is 6. For that 30MB process that means you only need 180MB in the worst case scenario rather than the 4GB required with the default MPM settings for prefork.
Keep that in mind and one has to question how wise the
advice in the Django documentation is that states "you should use Apache’s prefork MPM, as opposed to the worker MPM" when using mod_python.
All well and good if you run your own computer with huge amounts of memory and little traffic, but a potential recipe for disaster if you don't know that you should be changing the default MPM settings and you are using a memory constrained VPS and your site becomes popular or becomes subject to the
Slashdot effect.
With Django 1.0 now believed to be multithread safe, which was in part why prefork was recommended previously, that advice should perhaps be revisited, or it made obvious that one would need to consider tuning your Apache MPM settings if you intend using prefork MPM.
Now, it needs to be stated that all of the above about mod_python equally applies to embedded mode of
mod_wsgi. Thus, using mod_wsgi isn't necessarily some magic pill which will solve all your problems overnight.
Most people who change to using mod_wsgi don't actually have a problem though, but that that is the case is usually more by accident rather than design. This is because they see the additional benefits they get from using daemon mode of mod_wsgi and choose to use it over embedded mode. By this simple decision they have escaped the main issue with embedded mode, which is that Apache can lazily create processes and that for prefork MPM the maximum number of processes is excessive.
Some have realised that mod_wsgi daemon mode seems to offer a more predictable memory usage profile and performance curve and as a result fervently recommend it, but at the same time they still don't seem to understand what the problems with embedded mode, as outlined above actually were. So, hopefully the explanation above will help in clearing up why, not just in the case of mod_wsgi daemon mode vs mod_wsgi embedded mode, but also for the much maligned mod_python.
So, what should you be using? The simple answer is that if you don't understand how to configure Apache and see it as some huge beast then you should certainly be tossing out mod_python. Instead, you would be much better off using mod_wsgi daemon mode.
Should one ever use embedded mode? Technically running embedded mode with prefork MPM should offer the best performance, especially for machines with many cpus/cores. If however you don't have huge amounts of memory, don't dedicate the system to just the dynamic Python web application and you don't change the default MPM settings for Apache, then you are potentially setting yourself up for disaster.
In practice one also has to realise that the underlying web server is never usually going to be the bottleneck. Instead the bottleneck will be your Python web application and the database it uses. So, just because mod_wsgi embedded mode and prefork MPM may be the fastest solution out there for Apache and saves you a few milliseconds per request, that gain is going to be completely swallowed up by the overheads elsewhere in the system and not end up giving you any significant advantage.
You see a lot of people though still obsessing about the underlying raw performance of the web server. Frankly, you are just wasting your time. You will get greater benefits from concentrating on the performance of your application and using techniques such as caching and database query optimisation and indexing to make things run faster.
The final answer? Stop using mod_python, use mod_wsgi and run it with daemon mode instead. You will save yourself a lot of headaches by doing so.