Pro_Drupal7_Development: Linux System Tuning for High Traffic Servers [Optimizing Drupal]

Tuning Linux to handle high volumes of web traffic deserves a book unto it self. There are, however, simple changes that will

help improve the performance of high traffic sites, such as those outlined in the sysctl_set.sh script here (courtesy of AudunYtterdal,http://www.varnish-cache.org/lists/pipermail/ varnish-misc/2008-April/001763.html).

#!/bin/sh

# Tweaks (see http://varnish-cache.org/wiki/Performance) echo "

net.ipv4.ip_local_port_range = 1024 65536 net.ipv4.tcp_rmem=4096 87380 16777216

net.ipv4.tcp_wmem=4096 65536 16777216 net.ipv4.tcp_fin_timeout = 3 net.ipv4.tcp_no_metrics_save=1 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_max_orphans = 262144net.ipv4.tcp_max_syn_backlog = 262144 net.ipv4.tcp_synack_retries = 2

net.ipv4.tcp_syn_retries = 2 net.core.rmem_max=16777216

net.core.wmem_max=16777216 net.core.netdev_max_backlog = 30000 net.core.somaxconn = 262144

" > sysctl_tweaks.conf

sysctl -p sysctl_tweaks.conf

The description of the variables listed above is as follows:

ip_local_port_range: Maximize the range of network ports available for establishing network connections

tcp_rmem and tcp_wmem, rmem_max and wmem_max: Increase the size of network I/O buffers

tcp_fin_timeout: Decrease the time to close lingering network connections

tcp_max_orphans: Increase number of sockets held by the system that are not attached to something yet

tcp_max_syn_backlog: Increase number of SYN handshakes to keep in memory (requires tcp_syncookies=1)

tcp_synack_retries: Decrease the number of attempts to establish a TCP connection

netdev_max_backlog: Increase maximum number of incoming packets that can be queued up for upper-layer processing

somaxconn: The size of the listen queue for accepting new TCP connections

Using Fast File Systems

Slow file systems are the tar pits of LAMP stacks. Every layer of the LAMP is touching the file system very frequently.

Storing your database on a slow file system will certainly cause poor performance. Examples of fast file systems include:

ramfs or tmpfs (uses memory as disk space) ext2 on a local disk

ext3 on a local disk XFS on a local disk hardware raid

SAN or NAS using dedicated hardware

Examples of slow file systems include (compared to the foregoing choices): virtual disks (inside any virtualized server

environment)

NFS and other types of software-driven network file shares software raided disks (depending on the chosen raid level)

S3FS (mounts Amazon S3 storage as a local disk) LVM (slows down as more volume snapshots are retained)

Much of LAMP stack design involves deciding on which volumes to store web content and database tables based on the

size, speed, and reliability of the file system. Your best performance choice is to use the fastest file system available and

ensure uptimeand integrity with redundancy (ie., multiple redundant servers and database replication).

Dedicated Servers vs. Virtual Servers

Dedicated physical servers are going to outperform virtual servers when it comes to network I/O, disk I/O, and memory

I/O, even in situations where the virtual server supposedly has been allocated more resources (CPU, disk, and memory)

than a dedicatedserver of similar specs. An important factor to consider is that in a virtualized server environment, the CPU,

disk I/O, memory I/O, and network I/O have added I/O routing layers between the server OS and the actual hardware.

And, therefore, all I/Ooperations are subject to task scheduling whims of the host hypervisor as well as the demands of

neighboring virtual machines on the same physical host.

As a real example, a virtual server hosting a database server may have twice as much CPU power as a cheaper physical

dedicated server; however, the virtual server may also have an added 1ms network latency (a very real example from an actual

Xen virtualized environment), even between neighboring virtual machines. Now, 1ms network latency doesn’t seem like enough

latency to care about, until you consider that a logged-in Drupal page request may involve hundreds of serialized MySQL queries;

thus the total network latency overhead can amount to a full second of your page load time. An added latency of just one second

per page request may also seemaffordable; however, also consider the rate of incoming page requests and whether this one-second

delay will cause PHP processes to pile up in heavy traffic, thus driving up your server load. Adding more and bigger virtual serversto your stack does not make this I/O latency factor disappear either. The same can be said for disk I/O: virtual disks will always

be slower than physical local physical disks, no matterhow much CPU and memory the virtual server has been allocated.

However, virtual servers have the advantage of being “elastic,” which means it’s easier to quickly scale horizontally (by adding

more servers). Also when dedicated hardware breaks, you have to stop and fix it, unless you have a lot of hot spare servers in the

rack, and as we all know, actual “hot spare” hardware is really just a fantasy that sys admins dream about

and never actually get.

Avoiding Calling External Web Services

A web server killer we see quite often is custom Drupal modules that call out to an external web service and that external service isslow or unresponsive. This kind of issue can quickly render your web server totally unresponsive to page requests becausesoon all PHP processes are tied up waiting on an external service that isn’t answering. The root cause is that PHP’s default_socket_time

out defaults to a generous 60 seconds, so each of your PHP processes will block a full minute waiting for a packet that isn’t coming.The first obvious suggestion is “don’t do that”: don’t make frequent call-outs to an external web service you have no control over,

and instead use some other strategy, such as a background process that periodically pulls the external content and caches itlocally.

But if you insist, then at least use PHP’s stream_set_timeout() or decrease the default_socket_timeout in php.ini so that

unresponsive connections are dropped within three seconds.

Decreasing Server Timeouts

There are a variety of timeout settings in each layer of a LAMP server stack. The importance of lowering timeout settings is that it prevents a slow or unresponsive service from causing a process load pile-up on your web server. It is advisable to decrease all timeout settings as low as you can tolerate. For example, Apache’s mod_fcgid has a setting called Busy Timeout, which by default

waits for 5 minutes before terminating a long-running PHP process, which you may decrease down to 30 seconds, considering that

any page taking longer than 30 seconds to deliver ought to just fail rather than tie up your web server for another 4 minutes. Other

key timeouts to consider decreasing include Apache’s Timeout setting, PHP’s max_execution_time, PHP’s default_socket_timeout,Nginx proxy_read_timeout, as well as a variety of Linux kernel TCP settings.

One notable PHP process in Drupal that may be allowed to run longer than five minutes is cron.php, which invokes all calls to

Drupal’s hook_cron(). It is advisable to delegate only fast, simple tasks to hook_cron() and heavier tasks to crontab shell scripts.

Database Optimization

Drupal does a lot of work in the database, especially for authenticated users and custom modules. It is common for the database to be the cause of the bottleneck. Here are some basic strategies for optimizing Drupal’s use of the database.

Enabling MySQL’s Query Cache

MySQL is the most common database used with Drupal. MySQL has the ability to cache frequent queries in RAM so that

the next time a given query is issued, MySQL will return it instantly from the cache. However, in most MySQL

installations, this feature isdisabled by default. To enable it, add the following lines to your MySQL option file; the file is

named my.cnf and specifies the variables and behavior for your MySQL server (see http://dev.mysql.com/doc/refman/

5.1/en/option-files.html). In this case, we’re setting the query cache to 64MB:

# The MySQL server [mysqld] query_cache_size=64M

The current query cache size can be viewed as output of MySQL’s SHOW VARIABLES command:

mysql>SHOW VARIABLES LIKE 'query_cache%';

...

| query_cache_size | 67108864

| query_cache_type | ON

...

Experimenting with the size of the query cache is usually necessary. Too small a cache means cached queries will be invalidated too often. Too large a cache means a cache search may take a relatively long time; also, the RAM used for the cache may be better

used for other things, like more web server processes, memcache, or the operating system’s file cache.

MySQL InnoDB Performance on Windows

MySQL’s InnoDB storage engine, which is Drupal’s default choice when using MySQl, has especially slow write performance on Windows. This poor performance will surface in Drupal if you try load the Admin Modules page and notice you have time to go make a sandwich. You have two ways of fixing this: either convert all tables to MyISAM (OK choice for servers with light traffic), or in

your MySQL config set innodb_flush_log_at_trx_commit=2, which tells InnoDB to be less zealous about waiting for disk writes to complete.

Drupal Performance

There are two often overlooked areas for improving Drupal performance that are simple to implement.

Eliminating 404 Errors

One of the most overlooked performance drains of a typical Drupal site are seemingly innocent 404 (File not found) errors.This is

because Drupal is often configured to deliver a full dynamic response to a 404 error, even if that request was for a tiny image file in a forgotten style sheet or a favicon.ico deleted long ago. The solution is to resolve each of the 404 errors reported in Drupal’s admin

logs, and change the ErrorDocument directive in your .htaccess to look something like this

<FilesMatch "\.(png|gif|jpe?g|s?html?|css|js|cgi|ico|swf|flv|dll)$"> ErrorDocument 404 default

</FilesMatch>

Disabling Modules You’re Not Using

Disable any modules that you are not using to avoid Drupal interacting with these modules. Don't leave devel modules

running on your production site!

Drupal-Specific Optimizations

While most optimizations to Drupal are done within other layers of the software stack, there are a few buttons and levers within

Drupal itself that yield significant performance gains.

Page Caching

Sometimes it’s the easy things that are overlooked, which is why they’re worth mentioning again. Drupal has a built-in way to reduce

the load on the database by storing and sending compressed cached pages requested by anonymous users. By enabling the cache, you are effectively reducing pages to a single database query rather than the many queries that might have been executed otherwise.

Drupal caching is disabled by default and can be configured at Configuration ->Performance. For more information, see Chapter 16.

Bandwidth Optimization

There is another performance optimization on the Configuration -> Performance page to reduce the number of requests made to

the server. By enabling the “Aggregate and compress CSS files into one” feature, Drupal takes the CSS files created by modules,

compresses them, and rolls them into a single file inside a css directory in your “File system path.” The “Aggregate JavaScript files

into one file” feature concatenates multiple JavaScript files into one and places that file inside a js directory in your “File system

path.” This reduces the number of HTTP requests per page and the overall size of the downloaded page.

Pruning the Sessions Table

Drupal stores user sessions in its database rather than in files (see Chapter 17). This makes Drupal easier to set up across

multiple machines, but it also adds overhead to the database for managing each user’s session information. If a site is

getting tens of thousandsof visitors a day, it’s easy to see how quickly this table can become very large. PHP gives you

control over how often it should prune old session entries. Drupal has exposed this configuration in its settings.php file.

ini_set('session.gc_maxlifetime', 200000); // 55 hours (in seconds)

The default setting for the garbage collection system to run is a little over two days. This means that if a user doesn’t log in for twodays, his or her session will be removed. If your sessions table is growing unwieldy, you’ll want to increase the frequency of PHP’ssession garbage collection.

ini_set('session.gc_maxlifetime', 86400); // 24 hours (in seconds) ini_set('session.cache_expire',1440); // 24 hours(in minutes)

When adjusting session.gc_maxlifetime, it also makes sense to use the same value for session.cache_expire, which controls the

time to live for cached session pages. Note that the session.cache_expire value is in minutes.

Managing the Traffic of Authenticated Users

Since Drupal can serve cached pages to anonymous users, and anonymous users don’t normally require the interactive components of Drupal, you may want to reduce the length of time users stay logged in or, crazier yet, log them out after they close their browser windows. This is done by adjusting the cookie lifetime within the settings.php file. In the followingline, we change the valueto 24 hours:

ini_set('session.cookie_lifetime', 86400); // 24 hours (in seconds)

And here we log users out when they close the browser:

ini_set('session.cookie_lifetime', 0); // When they close the browser.

The default value in settings.php (2,000,000 seconds) allows a user to stay logged in for just over three weeks

(provided session garbage collection hasn’t removed their session row from the sessions database).

Logging to the Database

Drupal ships with the Database logging module enabled by default. Entries can be viewed at Reports -> Recent log entries.

The watchdog table in the database, which contains the entries, can bloat fairly quickly if it isn’t regularly pruned. If you find that the

size of the watchdog table is slowing your site down, you can keep it lean and mean by adjusting the settings found at

Configuration -> Logging and errors. Note that changes to this setting will take effect when cron runs the next time. Not running

cron regularly will allow the watchdog table to grow endlessly, causing significant over head.

Logging to Syslog

The syslog module, which ships with Drupal core but is disabled by default, writes calls to watchdog() to the operating

system log using PHP’s syslog() function. This approach eliminates the database inserts required by the Database logging

module.

Running cron

Even though it’s step nine of Drupal’s installation instructions, setting up cron is often overlooked, and this oversight can bring a site

to its knees. By not running cron on a Drupal site, the database fills up with log messages, stale cache entries, and other statistical

data that is otherwise regularly wiped from the system. It’s a good practice to configure cron early on as part of the normal install

process. See step seven of Drupal’s INSTALL.txt file for more information on setting up cron.

Architectures

The architectures available for Drupal are those of other LAMP-stack software, and the techniques used to scale are

applicable to Drupal as well. Thus, we’ll concentrate on the Drupal-specific tips and gotchas for different architectures.

Single Server

This is the simplest architecture. The web server and the database run on the same server. The server may be a shared host or a dedicated host. Although many small Drupal sites run happily on shared hosting, serious web hosting that expects to scaleshould

take place on a dedicated host.With single-server architecture, configuration is simple, as everything is still done on one server.

Likewise, communication between the web server and the database is fast, because there is no latency incurred by moving data

over a network. Clearly, it’s advantageous to have a multi-core processor, so the web server and database don’t need to jockey

as much for processor time.

Separate Database Server

If the database is your bottleneck, a separate and powerful database server may be what you need. Some performance will be

lost because of the overhead of sending requests through a network, but scalability will improve.

Separate Database Server and a Web Server Cluster

Multiple web servers provide failover and can handle more traffic. The minimum number of computers needed for a cluster is two

web servers. Additionally, you need a way to switch traffic between the machines. Should one of the machines stop responding, the rest of the cluster should be able to handle the load.

Load Balancing

Load balancers distribute web traffic among web servers. There are other kinds of load balancers for distributing other resources,

such as hard disks and databases, but here, I’m just talking about distributing HTTP requests. In the case of multiple webservers,

load balancers allow web services to continue in the face of one web server’s downtime or maintenance.

There are two broad categories of load balancers. Software load balancers are cheaper or even free but tend to have more on going maintenance and administrative costs than hardware load balancers.

Linux Virtual Server (www.linuxvirtualserver.org/) is one of the most popular Linux load balancers. Hardware load balancers are

expensive, since they contain more advanced server switching algorithms, and tend to be more reliable than software-based solutions.

In addition to load balancing, multiple web servers introduce several complications, primarily file uploading and keeping the code base consistent across servers.

File Uploads and Synchronization

When Drupal is run on a single web server, uploaded files are typically stored in Drupal’s files directory. The location is

configurable at Configuration -> File system. With multiple web servers, the following scenario must be avoided:

1. A user uploads a file on web server A; the database is updated to reflect this.

2. A user views a page on web server B that references the new file. File not found!

Clearly, the answer is to make the file appear on web server B also. There are several approaches.

Using a Shared, Mounted File System

Rather than synchronize multiple web servers, you can deploy a shared, mounted file system, which stores files in a single location on a file server. The web servers can then mount the file server using a protocol like GFS, AFS, or NFS. The advantages of this

approach are that cheap additional web servers can be easily added, and resources can be concentrated in a heavy-duty file server

with a redundant storage system like RAID 5. The main disadvantage to this system is that there is a single point of failure; if your

server or file system mounts go down, the site is affected unless you also create a cluster of file servers.

If there are many large media files to be served, it may be best to serve these from a separate server using a lightweight web server, such as Nginx, to avoid having a lot of long-running processes on your web servers contending with requests handled by Drupal. An easy way to do this is to use a rewrite rule on your web server to redirect all incoming requests for a certain file type to the static server. Here’s an example rewrite rule for Apache that rewrites allrequests for JPEG files:

RewriteCond %{REQUEST_URI} ^/(.*\.jpg)$ [NC] RewriteRule .* http://static.example.com/%1 [R]

The disadvantage of this approach is that the web servers are still performing the extra work of redirecting traffic to the file server. An improved solution is to rewrite all file URLs within Drupal, so the web servers are no longer involved in static file requests.

Beyond a Single File System

If the amount of storage is going to exceed a single file system, chances are you’ll be doing some custom coding to implement storageabstraction. One option would be to use an outsourced storage system like Amazon’s S3 service.

Multiple Database Servers

Multiple database servers introduce additional complexity, because the data being inserted and updated must be replicated or

partitioned across servers.

Database Replication

In MySQL database replication, a single master database receives all writes. These writes are then replicated to one or more slaves. Reads can be done on any master or slave. Slaves can also be masters in a multitiered architecture.

Database Partitioning

Since Drupal can handle multiple database connections, another strategy for scaling your database architecture is to put some tables inone database on one machine, and other tables in a different database on another machine. For example, moving all cache tables to a separate database on a separate machine and aliasing all queries on these tables using Drupal’s table prefixing mechanism can help yousite scale.

Finding the Bottleneck

If your Drupal site is not performing as well as expected, the first step is to analyze where the problem lies. Possibilities include the web server, the operating system, the database, file system, and the network.

Knowing how to evaluate the performance and scalability of a system allows you to quickly isolate and respond to system

bottlenecks with confidence, even amid a crisis. You can discover where bottlenecks lie with a few simple tools and by asking

questions along the way. Here’s one way to approach a badly performing server. We begin with the knowledge that performance isgoing to be bound by one of the following variables: CPU, RAM, I/O, or bandwidth. So begin by asking yourself the following

questions:

Is the CPU maxed out? If examining CPU usage with top on Unix or the Task Manager on Windows shows CPU(s) at 100 percent,your mission is to find out what’s causing all that processing. Looking at the process list will let youknow whether it’s the web

server or the database eating up processor cycles. Both of these problems are solvable.

Is the server paging excessively? If the server lacks enough physical memory to handle the allocated task, the operating system will use virtual memory (disk) to handle the load. Reading and writing from disk is significantly slower than reading and writing to

physical memory. If your server is paging excessively, you’ll need to figure out why.

Are the disks maxed out? If examining the disk subsystem with a tool like vmstat on Unix or the Performance Monitor on

Windows shows that disk activity cannot keep up with the demands of the system while plenty of free RAMremains, you’ve got anI/O problem. Possibilities include excessively verbose logging, an improperly configured database that is creating many temporary

tables on disk, background script execution, improper use of a RAID level for awrite-heavy application, and so on.

Is the network link saturated? If the network pipe is filled up, there are only two solutions. One is to get a bigger pipe. The other

is to send less information while making sure the information that is being sent is properly compressed.

Web Server Running Out of CPU

If your CPU is maxed out and the process list shows that the resources are being consumed by the web server and not the database (which is covered later), you should look into reducing the web server overhead incurred to serve a request. Often theexecution

of PHP code is the culprit. See the description of PHP optimizations earlier in the chapter.

Often custom code and modules that have performed reasonably well for small-scale sites can become a bottleneck when moved into

production. CPU-intensive code loops, memory-hungry algorithms, and large database retrievals can be identified by profiling

your code to determine where PHP is spending most of its time and thus where you ought to spend most of your time debugging.

If, even after adding an opcode cache and optimizing your code, your web server cannot handle the load, it is time to get a beefierbox with more or faster CPUs or to move to a different architecture with multiple web server front ends.

Web Server Running Out of RAM

The RAM footprint of the web server process serving the request includes all of the modules loaded by the web server (such as

Apache’s mod_mime, mod_rewrite, etc.) as well as the memory used by the PHP interpreter. The more web server and Drupal

modules that are enabled, the more RAM used per request.

Because RAM is a finite resource, you should determine how much is being used on each request and how many requests your webserver is configured to handle. To see how much real RAM is being used on average for each request, use a program like top (on Linux) to see your list of processes. In Apache, the maximum number of simultaneous requests that will be served is set using the MaxClients directive. A common mistake is thinking the solution to a saturated web server is to increase the value of MaxClients. This only complicates the problem,since you’ll be hit by too many requests at once. That means RAM will be exhausted, and your server will start disk swapping and become unresponsive. Let’s assume, for example, that your web server has 2GB of RAM and

each Apache request is using roughly 20MB (you can check the actual value by using top on Linux or Task Manager on Windows). You can calculate a good value for MaxClients by using the following formula; keep in mind the fact that you will need to

reserve memory for your operating system and other processes:

2GB RAM / 20MB per process = 100 MaxClients

If your server consistently runs out of RAM even after disabling unneeded web server modules and profiling any custom modules orcode, your next step is to make sure the database and the operating system are not the causes of the bottleneck. If theyare, then

add more RAM. If the database and operating system are not causing the bottlenecks, you simply have more requests than you can

serve; the solution is to add more web server boxes.

Identifying Expensive Database Queries

If you need to get a sense of what is happening when a given page is generated, devel.module is invaluable. It has an option to

display all the queries that are required to generate the page along with the execution time of each query. Another way to find out which queries are taking too long is to enable slow query logging in MySQL. This is done in the MySQL option file (my.cnf) as

follows:

# The MySQL server [mysqld]

log-slow-queries

This will log all queries that take longer than ten seconds to a log file at example.com-slow.log in MySQL’s data directory. You can

change the number of seconds and the log location as shown in this code, where we set the slow query threshold to fiveseconds and the file name to example-slow.log:

# The MySQL server [mysqld] long_query_time = 5

log-slow-queries = /var/log/mysql/example-slow.log

Pro_Drupal7_Development

Thứ Hai, 16 tháng 6, 2014

Linux System Tuning for High Traffic Servers [Optimizing Drupal]

Không có nhận xét nào:

Đăng nhận xét