Pro_Drupal7_Development: Caching Is the Key to Drupal Performance [Optimizing Drupal]

The three secrets to optimal Drupal performance are cache, cache, and more cache. Every layer of the Drupal

server stack offers its own caching options, and you should familiarize yourself with how to take advantage of

all of them. Here’s a listof key areas toconsider as you look for opportunities to improve the performance of

your site:

PHP op code cache: Op code caching is critical and its importance can be understated. There is no good reason for not having an opcode cache other than if you happen to prefer having high server loads and slow page load

times. For PHP op code caches, your choices include APC, XCache, eAccelerator, etc., any of which can easily

be installed into your PHP environment. The best practice for opcode cache is APC (drupal.org/project/apc).

See Figure 23-1 for an example of a report generated by APC.

Reverse proxy cache: A reverse proxy cache takes a tremendous amount of load off your web servers. A proxy cache is a fast web server that sits in front of your back-end web servers, caching any cache able content

passing through it(as a write-through cache) so that subsequent web requests are served directly from the

proxy cache rather than from your back-end servers. I’ll talk about Varnish in a bit, the preferred

solution for reverse proxy caching.

Database caches: MySQL has its own built-in caches, particularly the query cache (query_cache_size) and file

system I/O cache (innodb_buffer_pool_size), which ought to be increased as high as your database server has

the memory available to do so.

Drupal caches: Drupal has its own caches for pages, blocks, and Views. Visit the Drupal performance page in

your Drupaladmin interface, and turn them all on. I’ll also talk about Press flow, an optimized version of Drupal

that improves on Drupal’s own internal caching mechanisms.

Figure 23-1. Alternative PHP Cache (APC) comes with an interface that displays

memory allocation and the files currently within the cache.

Often the system takes a performance hit when data must be moved to or from a slower device such as a hard disk drive.What if you could bypass this operation entirely for data that you could afford to lose (like session

data)? Enter memcached, a system that reads and writes to memory.

Memcached is more complicated to set up than other solutions proposed in this chapter, but it is worth talking

about when scalability enhancements are needed in your system.

Drupal has a built-in database cache to cache pages, menus, and other Drupal data, and the MySQL database is

capable of caching common queries, but what if your database is straining under the load? You could buy another database server, or you could take the load off of the database altogether by storing

somethings directly in memory instead of in the database. The memcached library (see www.danga.com/

memcached/) and the PECL Memcache PHP extension (see http://pecl.php.net/ package/memcache) are just

the tools to do this for you.

The memcached system saves arbitrary data in random access memory and serves the data as fast as possible.

This type of delivery will perform better than anything that depends on hard disk access. Memcached stores

objects and references them with a unique key for each object. It is up to the programmer to determine what

objects to put into memcached. Memcached knows nothing about the type or nature of what is put into it; to

its eyes, it is all a pile of bits with keys for retrieval.

The simplicity of the system is its advantage. When writing code for Drupal to leverage memcached, developers can decide to cache whatever is seen as the biggest cause of bottlenecks. This might be the results of database

queries that get run very often, such as path lookups, or even complex constructions such as fully built nodes andtaxonomy vocabularies, both of which require many database queries and generous PHP processing to produce. A memcache module for Drupal and a Drupal-specific API for working with the PECL Memcache interface can

be found at http://drupal.org/project/memcache.

Optimizing PHP

On Apache servers, you have two ways to execute PHP code: Fastcgi (mod_fcgid, mod_fastcgi, or PHP-

FPM) or mod_php. The key difference between them is mod_php will execute PHP code directly in Apache, whereas the Fast cgi variants will passeach PHP request to an external php-cgi process, which executes PHP

outside of Apache and then pipes its output back to Apache.

On an Nginx web server (more about Nginx later in this chapter), the choice is made simpler because you’re

limited to using only the NginxHttpFcgiModule (Fastcgi), as Nginx does not have a built- in PHP interpreter

module such as mod_php.mod_php and the Fastcgi variants perform marginally the same—after all

they’re really using the same underlying PHP interpreter running the same PHP code underneath. The only key difference is where their inputs and outputs are being redirected. Unsurprisingly, benchmarking equally sized mod_php and Fastcgi process pools shows nearly the same server loads and Drupal delivery performance.

An Apache+mod_php process pool with 25 child processes and an Apache+Fastcgi process pool with 25 PHP processes will have the same overall memory footprint and performance characteristics. However, the Fastcgi

variants offer the option of sizing your PHP process pool independently from your Apache process pool, whilewith mod_php your pool of PHP interpreters is equal to the number of Apache processes. For this reason,

some may advocate a Fastcgi approach overmod_php because Fastcgi “saves memory.” This might be true if

you ignored APC opcode cache size considerations (also explained here) and you chose to restrict the total number of Fastcgi processes to be dramatically fewer than the number of Apache child processes. However,

severely limiting the size of your PHP process pool can severely bottleneck your PHP through put: that’d be

similar to closing three lanes of a busy four-lane highway for no better reason than to “save space” and there by cause traffic jams. There’s another important memory usage consideration: PHP’s APC opcode cache is

shared across mod_php processes (all mod_php processes refer to the same APC cache block), but APC

cache is not shared across php-cgi processes when using mod_fcgid.

Given that the typical size of an APC opcode cache for a Drupal server could be 50MB or more, this means

when using an APC opcode cache (as any reasonable Drupal server should), the entire process pool of Apache

and php-cgi processes will al together use a lot more memory than the same size pool of Apache and mod_php

processes. So which performs better? The answer is neither mod_php nor Fastcgi performs dramatically better

than the other when given the same amount of resources. However, you may consider using a Fastcgi option if

you want to tune your Apache process pool size differently than your PHP process pool, for other reasons,

such as on multi-tenant web servers, because Fastcgi offers user-level separation of processes.

Setting PHP Opcode Cache File to /dev/zero

Both APC and XCache offer an option to set the path of the opcode cache. In APC the path of cache storage,

the apc.mmap_file_mask setting, determines which shared memory mechanism it uses. System VIPC shared

memory is a decent choice but limited to only 32MB on most Linux systems, which can be raised, but by defaultit’s not enough opcode cache for typical Drupal sites. POSIX mmap shared memory can share memory blocks of any size; however, it performs quite poorly if that memory is backed by a disk file, as frequent shared memory

I/O operations will translate into large and frequent disk I/O operations, which is especially noticeable on slow

disks. The solution is to set your memory map path to /dev/zero, which tells mmap not to back the memory

region with disk storage. Fortunately APC uses this mode by default, unless you’ve explicitly set apc.mmap_

file_mask to any path other than /dev/zero.

PHP Process Pool Settings

By “PHP process pool” I’m referring to the entire PHP execution process pool on your web server, which

determines how many concurrent PHP requests your server can deliver without queuing up requests. The PHP process pool is managed either by Apache+mod_php or some variant of Fastcgi: mod_fcgid, mod_fastcgi, or

PHP-FPM (FastCGI Process Manager). The PHP process pool tuning considerations are as follows:

Run as many PHP interpreters as memory will allow. If you’re running mod_php, then your PHP pool size is

the number of Apache child processes, which is determined by the Apache config settings Start Servers, Min Spare Servers,Max Spare Servers, and Max Clients, which can all be set to the same amount to keep the pool

size constant. If you’re running a Fastcgi variant, such as mod_fcgid, then your PHP pool size Max Process Count, Default Max Class Process Count, and Default Min Class Process Count, should all be set to the same

amount to keep the pool size constant. For an 8GB web server, you may try setting your PHP process pool

size to 50, then loadtest the server by requesting many different Drupal pages with a user client concurrency of

50, and set the think time between page requests of least 1 second per client. If the server runs out of memory and/or begins to scrape swap space, then decrease the number for PHP process pool size and try again. Server

load may inevitably climb during such a load load test, but it’s not an issue to be concerned with during this

tuning test. Keep as many idle PHP interpreters hanging around for as long as possible.You want to avoid

churning your PHP process pool, which means to avoid constantly reaping and re-spawning PHP interpreters

in response to the web trafficload of the moment.

Instead it’s better to create a constant-size pool of PHP interpreters, as many as your server memory can hold,

and have that pool size remain constant even if most of those processes are idle most of the time. Formod_php you’ll want to set Apache’s Start Servers, Min Spare Servers, Max Spare Servers, and Max Clients all equal to

each other, in which case 50 is a decent starting value for an 8GB Drupal web server. This creates a constant-sizepreforked pool of Apache+mod_php processes. The other key Apache setting for mod_php is Max Requests Per Child, which ideally you will want to set at 0 so that Apache does not re-spawn child processes.

But if your web server slowly leaks memory over time, and you strongly suspect mod_php is leaking memory,then you may set Max Requests Per Child to 10000 or more, and then dial it down until the memory leak

issue is under control.

For mod_fcgid, if you’re experiencing a php-cgi segfault on every 501st PHP request (a known bug in mod_

fcgid,which may have already been addressed as of this writing), then you will have to set Max Requests Per Process to 500,which will force each php-cgi interpreter to re-spawn itself every 500 requests. Other wise, set

mod_fcgid Max Requests Per Process to 0 unless php-cgi processes are leaking memory. Also for mod_fcgid,

set IdleTimeout and IdleScanInterval to several hours or more to avoid the overhead of re-spawning PHP inter

preters on demand.

Tuning Apache

There are several configuration parameters that will help speed the execution of requests for Drupal sites running on an Apache web server. Some of the biggest improvements can be made through the following

recommendations.

mod_expires

This Apache module will let Drupal send out Expires HTTP headers, caching all static files in the user’s browser for two weeks oruntil a newer version of a file exists. This goes for all images, CSS and JavaScript files, and other static files. The end result isreduced bandwidth and less traffic for the web server to negotiate. Drupal is pre configured to work with mod_expires and will use it if it is available. The settings for mod_expires are

found in Drupal’s .htaccess file.

# Requires mod_expires to be enabled.

<IfModule mod_expires.c>

# Enable expirations.

ExpiresActive On

# Cache all files for 2 weeks after access (A).

ExpiresDefault A1209600

<FilesMatch \.php$>

# Do not allow PHP scripts to be cached unless they explicitly send cache

# headers themselves. Otherwise all scripts would have to overwrite the

# headers set by mod_expires if they want another caching behavior. This may

# fail if an error occurs early in the bootstrap process, and it may cause

# problems if a non-Drupal PHP file is installed in a subdirectory.

ExpiresActive Off

</FilesMatch>

</IfModule>

We can’t let mod_expires cache PHP-generated content, because the HTML content Drupal produces is not

always static. This is the reason Drupal has its own internal caching system for its HTML output (i.e., page

caching).

Moving Directives from .htaccess to httpd.conf

Drupal ships with two .htaccess files: one is at the Drupal root, and the other is automatically generated after

you create your directory to store uploaded files and visit Configuration -> File system to tell Drupal where

the directory is. Any .htaccess files are searched for, read, and parsed on every request. In contrast, httpd.conf

is read only when Apache is started. Apache directives can live in either file. If you have control of your own

server, you should move the contents of the .htaccess files to the main Apache configuration file (httpd.conf)

and disable .htaccess lookups within your web server root by setting AllowOverride to None:

<Directory /> AllowOverride None

...

</Directory>

This prevents Apache from traversing up the directory tree of every request looking for the .htaccess file to

execute. Apache will then have to do less work for each request, giving it more time to serve more requests.

MPM Prefork vs. Apache MPM Worker

The choice of Apache prefork vs. worker translates into whether to use multiple Apache child processes or fewer child processes, each with multiple threads. Generally for Drupal, the better choice is Apache prefork. Here’s

why:

PHP is not thread-safe, so if you’re using mod_php, then your only real choice is Apache prefork. If you’re

using Fastcgi (such as mod_fastcgi or mod_fcgid), then you could use Apache MPM worker because PHP

requests would be handled externally from Apache. However, using Apache MPM worker instead of Apache

MPM prefork is still not the big win that some make it out to be because there’s nothing magical about threads

that makes a multithreaded application automatically faster and more scalable than a preforked multi process

equivalent, even on multi-core systems, and this is for a few reasons:

First, it helps to demystify what threads really are to a Linux operating system: threads are mostly the same as child processes. What distinguishes a thread from a child process is that a thread has direct shared access to the memory contents of its parent process, whereas a forked child process gets a copy-on-write reference to the

memory contents of its parent process. This distinction offers a slight performance advantage to threads, which is then easily squandered on the often complex logistics of synchronizing shared memory access between threads.

Second, the perception that threads use significantly less memory than separate child processes is not as it seems.Using common system tools such as top and ps, it seems as though each Apache child process is using almost

as much memory as its Apache parent process. In fact, most of the memory footprint of each Apache child

process is the same exact memory regions used by the Apache parent process being repeatedly counted multiple

times. This is because most of the memory footprint of child processes is thecontents of shared libraries, which

most operating systems are smart enough to load into memory once, and every additional process using those

same libraries refers to the first shared copy in memory. Another memory usage consideration is child processeswill share most of the memory contents of its parents unless it modifies those contents(copy-on-write).

Third, you can kill runaway Apache child processes, but you can’t kill runaway Apache threads without

restarting all of Apache. From a server admin perspective, it’s easier to diagnose and address problems in a

prefork Apache process pool than a threaded Apache process pool. Of course, your mileage may vary, so benchmarking different Apache MPM configurations is still a worthy exercise.

Balancing the Apache Pool Size

When using Apache prefork, you want to size your Apache child process pool to avoid process pool churning.In other words, when the Apache server starts, you want to immediately prefork a large pool of Apache

processes (as many as your web server memory can support) and have that entire pool of child processes

present and waiting for requests, even if they are idle most of the time, rather than constantly incurring the

performance overhead of killing and re-spawning Apache child processes in response to the traffic level of the

moment. Here are example Apache prefork settings for a Drupal web server running mod_php.

StartServers 40

MinSpareServers 40

MaxSpareServers 40

MaxClients 80

MaxRequestsPerChild 20000

This is telling Apache to start 40 child processes immediately, and always leave it at 40 processes even if trafficis low, but if traffic is really heavy, then burst up to 80 child processes. (You can raise the 40 and 80 limits

according to your own server dimensions.) You may look at this and ask, “Well, isn’t that a waste of memory

to have big fat idle Apache processes hanging about?” But remember this:the goal is to have fast page delivery, and there is no prize for having a lot of free memory. “My server is slow, but look at all that free RAM!!!” If you have the memory, then use it!

Decreasing Apache Timeout

The Timeout setting in the Apache config determines how long a web client can hold a connection open

without saying anything. Apache’s default Timeout is 5 minutes (300 seconds), which is far too polite. Decrease Apache’s Timeout to 20 seconds or less.

Disabling Unused Apache Modules

Comment out any Apache LoadModules if it is certain they’re not needed. Such candidates include

mod_cgi, mod_dav, and mod_ldap.

Using Nginx Instead of Apache

The more adventurous LAMP admins are substituting Apache with Nginx. Nginx is an excellent general- purpose server with massive scalability. However, Nginx does not support mod_php—rather, you’re limited to using

Fastcgi (php-cgi) to serve PHP requests, which is not a bad choice, just different. Also Nginx does not

comprehend Apache htaccess files, so you’ll have to translate any htaccess-specific directives in your Drupal

code base, such as Boost cache, into equivalent Nginx configuration directives. As for which is faster,

many would argue in favor of Nginx. But the real bottleneck in any Drupal stack is going to be the PHP or

database layer rather than the choice of web server. Nonetheless, Nginx’s strengths make it a good fit as a load

balancer (seeits http upstream module) and static content server.

Using Pressflow

Pressflow is a drop-in replacement of the standard Drupal core, including many performance enhancements

over and above Drupal core. Other wise, from all outward appearances,Press flow is entirely the same as Drupal.

Many of Pressflow’s features continue tomake their way into the Drupal core; however, the folks at Four Kitchens continue to push the envelope when it comes to optimizing Drupal. At the time this book was written, there

wasn’t an official release of Pressflow for Drupal 7. For up-to-date information on the features and functionality

incorporated into Pressflow, visit www.pressflow.org.

Varnish

Varnish is becoming the darling proxy cache server of the Drupal community. Varnish is a fast and powerful

HTTP reverse proxy cache server. A typical Drupal app server may be capable of delivering hundreds of

dynamic Drupal pages per minute. Varnish offers the ability to deliver thousands of cached Drupal pages per

second! And furthermore, requests served from Varnish generate no load on your back- end servers because

the cache-delivered requests never reach your back-end servers.

In a typical setup, Varnish is installed to listen on port 80 (the standard web server listening port) so that all webcontent requests hitVarnish first. Varnish decides whether to serve the request directly from its own cache or

echo the request back to back-end web servers. The cache and delivery policies are expressed in the local VCL

(Varnish Configuration Language) configuration file. VCL offers Varnish admins the ability to set very specific

cache policies using conditional expressions resembling Javascript. VCL also offers the ability to load balance

requests across many back- end servers, rewrite requests, change the content of requests, and block requests.

Furthermore, VCL language offers the ability to include inline C language for those wanting to manipulate the

request delivery process at the lowest levels possible. Note that Varnish does not support SSL (HTTPS

requests) and does not offer separate virtual host configurations in a shared hosting environment; however, in

Varnish VCL expressions can be bracketed inside a conditional based on the target host of the request. It’s also worth noting that Varnish is an HTTP write-through cache and not a generic key/value store, and so it’s not a

substitute for memcached nor does it offer a direct API for storing and fetching arbitrary data from cache. Other HTTP proxy cache alternatives include Squid, Apache with mod_cache, and Nginx’s http proxy cache module;

however, these options don’t offer the richness of Varnish’s VCL language. Worth noting is that Varnish is multi

threaded, so its scalability is limited to how many Varnish server threads your server can juggle at once. A moderately busy Varnish server may have a few hundred threads running, and a very busy Varnish can peak at justover a thousand threads. If your Varnish is not able to spawn more threads, then additional requests to your web site will be met with “Connection reset” errors. To allow Varnish to spawn more threads, edit the Varnish start up scripts to adjust the -w options (worker thread pool options) passed to the Varnish start command.

The second parameter passed into the -w option is the maximum number of threads Varnish can spawn.

Increase that setting to at least 4000.

Secondly, on Linux systems, each thread is allocated 8MB of virtual memory by default, which is far more than

any Varnish thread will require. So in your Varnish startup script, you’ll want to add the command “ulimit -s 512” to decrease the default stack space per thread to 512KB.

Normalizing incoming requests for better Varnish hits

The key to achieving good Varnish cache hits rates is to normalize the incoming HTTP requests so that all

anonymous requests for the same URL get the same cache hit from Varnish. To understand Varnish cache

coherency you must first understand how Varnish stores cache entries for each URL. Varnish combines the

following incoming request attributes into a hash key which it usesto store and lookup its cache entries:

request URL

incoming Host header incoming Cookie header

incoming Accept-Encoding header

The issue here is that the Cookie header and the Accept-Encoding header vary from browser to browser. For

example, it is highly likely that the variety of browsers hitting your web site have different cookies and thus different

Cookie headers. To address the variance of incoming Cookie headers you'll want to (at best) remove the entire

incoming Cookie header during the vcl_recv phase of your Varnish config, like so:

sub vcl_recv {

# Remove the incoming Cookie header from anonymous requests if (req.http.Cookie !~ "(^|;\s*)SESS") {

unset req.http.Cookie;

}

# ... other vcl_recv rules here ...

# Don't serve cached content to logged-in users if(req.http.cookie ~ "SESS") {

return(pass);}

# Attempt to serve from cache return(lookup);}

The above VCL snippet checks if the request is from a logged-in user (one that has a cookie starting with

"SESS") and if it not then normalizes the Cookie header by removing it altogether. If there is a need to have

some cookies from anonymous request echoed to your back end servers then you can adjust the Cookie regex

or add a few more lines to be more selective about which cookies ought to miss the Varnish cache lookup

pahse. The other incoming request header that needs to be normalized is Accept-Encoding because it varies

slightly across different web browser types. The most common use of the Accept-Encoding header if for

the web browser to communicate to the web server that the browser can receive compressed content. The

typical VCL snippet to normalize the Accept-Encoding looks like this:

# Normalize Accept-Encoding to get better cache coherency if (req.http.Accept-Encoding) {

# No point in compressing media that is already compressed

if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") { remove req.http.Accept-Encoding;

# MSIE 6 JS bug workaround

} elsif(req.http.User-Agent ~ "MSIE 6") { unset req.http.Accept-Encoding;

} elsif (req.http.Accept-Encoding ~ "gzip") { set req.http.Accept-Encoding = "gzip";

} elsif (req.http.Accept-Encoding ~ "deflate") { set req.http.Accept-Encoding = "deflate";

} else {

# unkown algorithm

remove req.http.Accept-Encoding;

}

Varnish: finding extraneous cookies

The following command line on your Varnish server is useful for watching live incoming Cookie headers that

being echoed from Varnish to your backend servers.

varnishlog | grep TxHeader | grep Cookie

This is useful for adjusting how the Cookie header is filtered in Varnish.

Boost

The popular Boost module for Drupal (http://drupal.org/project/boost) essentially builds a static file cache for

dynamically generated Drupal content. With the Boost module installed in Drupal, whenever Drupal generates a

dynamic page, Boost will save a static copy of that content so that the next anonymous request for that same

page will be delivered from the Boost cache. A background cron process periodically culls outdated pages from the Boost cache, which are then regenerated on the next request. This approach reduces overall PHP and

MySQL overhead but still requires Apache (or Nginx, IIS, lighthttpd) to process a few extra rewrite rules for

each page request. The key to good Boost performance is to put the Boost cache directory on a fast local file

system. Some Drupal admins may consider writing Boost cache files into a shared network file system so that

many web servers can share the same Boost cache files; however, a busy web site can have a lot of file

system I/O arise from Boost cache maintenance, so much so that a network shared file system slows down

considerably, in which case the Boost cache ought to be a local directory on each web server instead.

If each web server has extra memory but slow disks, then you may also consider writing your Boost cache

files to a local ramfs file system, which is a feature of Linux that allows you to create an ephemeral storage

volume that exists entirely in RAM.

Boost vs. Varnish

Although Boost and Varnish are different kinds of caching solutions, Drupal administrators often weigh these twooptions directly against each other. In general Boost is easier to set up and administer than Varnish. However,

Varnish offers a general solution to better performance as it can be used to proxy cache other kinds of content,

such as static images and stylesheets, and not just Drupal pages. Varnish also offers the ability to load balance and rewrite requests before they even reach your web server, whereasBoost requests are still hitting the web

server. However, it’s also possible to use Boost and Varnish together. You may just need to tune your HTTP

cache expiration headers and Boost cache purging so that Varnish and Boost are refreshing their caches in a

timely manner.

Pro_Drupal7_Development

Thứ Hai, 16 tháng 6, 2014

Caching Is the Key to Drupal Performance [Optimizing Drupal]

Không có nhận xét nào:

Đăng nhận xét