When users interact with Drupal, it is typically through a series of forms, such as the node submission form or the comment
submission form. Users might also post remotely to a Drupal-based blog via XML- RPC using the blogapi module
(http://drupal.org/project/blogapi). Drupal’s approach to user input can be summarized as store the original; filter on
output. The database should always contain an accurate representation of what the user entered. As user input is being
prepared to be incorporated into a web page, it is sanitized (i.e., potentially executable code is neutralized).
Security breaches can be caused when text entered by a user is not sanitized and is executed inside your program. This can
happen when you don’t think about the full range of possibilities when you write your program. You might expect users to
enter only standard characters, when in fact they could enter nonstandard strings or encoded characters, such as control
characters. You might have seen URLs with the string %20 in them for example, http://example.com/my%20document.
html. This is a space character that has been encoded in compliance with the URL specification (see www.w3.org/
Addressing/URL/url- spec.html). When someone saves a file named my document.html and it’s served by a web server, the space is encoded. The % denotes an encoded character, and the 20 shows that this is ASCII character 32 (20 is the hexadecimal representation of 32). Tricky use of encoded characters by nefarious users can be problematic, as you’ll see later in
this chapter.
Thinking About Data Types
When dealing with text in a system such as Drupal where user input is displayed as part of a web site, it’s helpful to think
of the user input as a typed variable. If you’ve programmed in a strongly typed language such as Java, you’ll be familiar with
typed variables. For example, an integer in Java is really an integer, and will not be treated as a string unless the programmerexplicitly makes the conversion. In PHP (a weakly typed language), you’re usually fine treating an integer as a string or an
integer, depending on the context, due to PHP’s automatic type conversion. But good PHP programmers think carefully
about types and use automatic type conversion to their advantage. In the same way, even though user input from, say, the Body field of a node submission form can be treated as text, it’s much better to think of it as a certain type of text. Is the user entering plaintext? Or is the user entering HTML tags and expecting that they’ll be rendered? If so, could these tags include harmful tags, such as JavaScript that replaces your page with an advertisement for cell phone ringtones? A page that
will be displayed to a user is in HTML format; user input is in a variety of “types” of textual formats and must be securely
converted to HTML before being displayed. Thinking about user input in this way helps you to understand how Drupal’s text conversion functions work. Common types of textual input, along with functions to convert the text to another format, are
shown in Table 21-1.
Table 21-1. Secure
Conversions from One Text
Type to Another
|
|||
Source Format
|
Target Format
|
Drupal Function
|
What It Does
|
Plain text
|
HTML
|
check_plain()
|
Encodes special characters into HTML entities and
validates strings at UTF-8 to prevent cross-site
scripting attacks
on Internet Explorer 6
|
HTML text
|
HTML
|
filter_xss()
|
Removes
characters and
constructs that can trick browsers. Makes sure that all HTML entities are well
formed. Makes sure that all HTML tags and attributes are well
formed, and makes sure
that no HTML tags contain URLs with a disallowed protocol (e.g., Javascript)
|
Rich text
|
HTML
|
check_markup()
|
Runs text through all enabled filters
|
Plain text
|
URL
|
drupal_encode_path()
|
Encodes a Drupal path for use in a URL
|
URL
|
HTML
|
check_url()
|
Strips
out harmful protocols, such as javascript:
|
Plain text
|
MIME
|
mime_header_encode()
|
Encodes non-ASCII, UTF-8 encoded characters
|
Plain Text
Plain text is text that is supposed to contain only, well, plain text. For example, if you ask a user to type in his or her
favorite color in a form, you expect the user to answer “green” or “purple,” without markup of any kind. Including this
input in another webpage without checking to make sure that it really does contain only plain text is a gaping security hole.
For example, the user might enter the following instead of entering a color:
<img
src="javascript:window.location ='<a
href="http://evil.example.com/133/index.php?s=11&"> http://evil.example.com/133/index.php?s=11&</a>;ce_cid=38181161'">
Thus, we have the function check_plain() available to enforce that all other characters are neutralized by encoding them as HTML entities. The text that is returned from check_plain() will have no HTML tags of any kind, as they’ve all been
converted toentities. If a user enters the evil JavaScript in the preceding code, the check_plain() function will turn it into thefollowing text, which will be harmless when rendered in HTML:
<img src="javascript:window.location ='<
a href="http://evil.example.com/133/index.php?s=11&">http://evil.example.com/133/index.php?s=11&</a>;ce_cid=38181161'">
HTML Text
HTML text can contain HTML markup. However, you can never blindly trust that the user has entered only “safe”HTML;generally you want to restrict users to using a subset of the available HTML tags. For example, the <script> tag is not one
that you generally want to allow because it permits users to run scripts of their choice on your site. Likewise, you don’t
want users using the <form> tag to set up forms on your site.
Rich Text
Rich text is text that contains more information than plain text but is not necessarily in HTML. It may contain wiki markup, or Bulletin Board Code (BBCode), or some other markup language. Such text must be run through a filter to convert the markup toHTML before display.
URL
URL is a URL that has been built from user input or from another untrusted source. You might have expected the user to
enter http://example.com, but the user entered javascript:runevilJS() instead. Before displaying the URL in an HTML
page, you must run it through check_url() to make sure it is well formed and does not contain attacks.
Using check_plain() and t() to Sanitize Output
Use check_plain() any time you have text that you don’t trust and in which you do not want any markup. Here is a
naïve way of using user input, assuming the user has just entered a favorite color in a text field. The following code is insecure:
drupal_set_message("Your favorite color is $color!"); // No input checking!
The following is secure but bad coding practice:
drupal_set_message('Your favorite color is ' . check_plain($color));
This is bad code because we have a text string (namely the implicit result of the check_plain() function), but it isn’t inside the t() function, which should always be used for text strings. If you write code like the preceding, be prepared for
complaints from angry translators, who will be unable to translate your phrase because it doesn’t pass through t(). You cannot just place variables inside double quotes and give them to t(). The following code is still insecure because no place holder is being used: drupal_set_message(t("Your favorite color is $color!")); // No input checking!
The t() function provides a built-in way of making your strings secure by using a placeholding token with a one-character
prefix, as follows. The following is secure and in good form:
drupal_set_message(t('Your favorite color is @color', array('@color' => $color)));
Note that the key in the array (@color) is the same as the replacement token in the string. This results in a message like
the following:
Your favorite color is brown.
The @ prefix tells t() to run the value that is replacing the token through check_plain().
In this case, we probably want to emphasize the user’s choice of color by changing the style of the color value. This is done using the % prefix, which means “execute -theme('placeholder', $value) on the value.” This passes the value through
check_plain() indirectly, as shown in Figure 21-1. The % prefix is the most commonly used prefix.
The following is secure and good form:
drupal_set_message(t('Your favorite color is %color', array('%color' => $color)));
This results in a message like the following. In addition to escaping the value, theme_placeholder() has wrapped the value
in <em></em> tags.
Your favorite color is brown.
If you have text that has been previously sanitized, you can disable checks in t() by using the ! prefix. For example, the l() function builds a link, and for convenience, it runs the text of the link through check_plain() while building the link. So in thefollowing
example, the ! prefix can be safely used:
// The l() function runs text through check_plain() and returns sanitized text
// so no need for us to do check_plain($link) or to have t() do it for us.
$link = l($user_supplied_text, $path);
drupal_set_message(t('Go to the website !website', array('!website' => $link));
The effect of the @, %, and ! placeholders on string replacement in t() is shown in Figure 21-1. Although for simplicity’s
sake it isn’t shown in the figure, remember that you may use multiple placeholders by defining them in the string and adding
members to the array, for example:
drupal_set_message(t('Your favorite color is %color and you like %food', array('%color'=>$color, '%food' => $food)));
Be especially cautious with the use of the ! prefix, since that means the string will not be run through check_plain().
Figure 21-1. Effect of the placeholder prefixes on string replacement
Using filter_xss() to Prevent Cross-Site Scripting Attacks
Cross-site scripting (XSS) is a common form of attack on a web site where the attacker is able to insert his or her own code into
a web page, which can then be used for all sorts of mischief. Suppose that you allow users to enter HTML on your web site,
expecting them to enter
<em>Hi!</em> My name is Sally, and I...
But instead they enter
Whoops! Again, the lesson is to never trust user input. Here is the function signature of filter_xss():
filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'blockquote', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'))
The filter_xss() function performs the following operations on the text string it is given:
1. It checks to make sure that the text being filtered is valid UTF-8 to avoid a bug with Internet Explorer 6.
2. It removes odd characters such as NULL and Netscape 4 JavaScript entities.
3. It ensures that HTML entities such as & are well formed.
4. It ensures that HTML tags and tag attributes are well formed. During this process, tags that are not on the white list—that is, the second parameter for filter_xss()—are removed. The style attribute is removed, too, because that can interfere
with the layout of a page by overriding CSS or hiding content by setting a spammer’s link color to the background color of the
page. Any attributes that begin with on are removed (e.g., onclick or onfocus) because theyrepresent JavaScript event-handler definitions. If you write regular expressions for fun and can name character codes for HTML entities from memory,
you’ll enjoy stepping through filter_xss() (found in modules/filter/filter.module)and its associated functions with a debugger.
5. It ensures that no HTML tags contain disallowed protocols. Allowed protocols are http, https, ftp, news, nntp, telnet, mail
to, irc, ssh, sftp, and webcal. You can modify this list by setting the filter_allowed_protocols variable.For example, you could restrict the protocols to http and https by adding the following line to your settings.php file (see the comment about variable overrides in the settings.php file):
$conf = array('filter_allowed_protocols' => array('http', 'https'));
Here’s an example of the use of filter_xss() from modules/aggregator/aggregator.pages.inc. The aggregator module deals with potentially dangerous RSS or Atom feeds. Here the module is preparing variables for use:
/**
* Safely render HTML content, as allowed.
*
* @param $value
* The content to be filtered.
* @return
* The filtered content.
*/
function aggregator_filter_xss($value) {
return filter_xss($value, preg_split('/\s+<|>/', variable_get('aggregator_allowed_html_tags',
'<a> <b> <br> <dd> <dl> <dt> <em> <i> <li> <ol>
<p> <strong> <u> <ul>'), -1, PREG_SPLIT_NO_EMPTY));}
Note the call to aggregator_filter_xss(), which is a wrapper for filter_xss() and provides an array of acceptable HTML tags.
Using filter_xss_admin()
Sometimes you want your module to produce HTML for administrative pages. Because administrative pages should be protected byaccess controls, it’s assumed that users given access to administrative screens can be trusted more than regular users. You could
setup a special filter for administrative pages and use the filter system, but that would be cumbersome. For these reasons, the
function filter_xss_admin() is provided. It is simply a wrapper for filter_xss() with a liberal list of allowed tags, including everything
except the <script>, <object>, and <style> tags. An example of its use is in the display of the site mission in a theme:
if (drupal_is_front_page()) {
$mission = filter_xss_admin(theme_get_setting('mission'));}
The site’s mission can be set only from the Configuration -> “Site information” page, to which only the superuser and users with
the “administer site configuration” permission have access, so this is a situation in which the use of filter_xss_admin() is
appropriate.
Handling URLs Securely
Often modules take user-submitted URLs and display them. Some mechanism is needed to make sure that the value the user has
given is indeed a legitimate URL. Drupal provides the check_url() function, which is really just a wrapper forfilter_xss_bad_
protocol(). It checks to make sure that the protocol in the URL is among the allowed protocols on the Drupal site (see step 5 in the earlier section “Using filter_xss() to Prevent Cross-Site Scripting Attacks”) and runs the URL throughcheck_plain().
If you want to determine whether a URL is in valid form, you can call valid_url(). It will check the syntax for http, https, and
ftp URLs and check for illegal characters; it returns TRUE if the URL passes the test. This is a quick way to make sure that
users aren’t submitting URLs with the javascript protocol.
If you’re passing on some information via a URL—for example, in a query string—you can use drupal_encode_path() to
pass along escaped characters. Calling drupal_encode_path() does some encoding of slashes for compatibility with Drupal’s clean
URLs and then calls PHP’s rawurlencode() function. The drupal_encode_path() function is not more secure than calling
rawurlencode() directly, but it is handy for making encoded strings that will work well with Apache’s mod_rewrite module.
Making Queries Secure with db_query()
A common way of exploiting web sites is called SQL injection. Let’s examine a module written by someone not thinking about
security. This person just wants a simple way to list titles of all nodes of a certain type:
/*
* Implements hook_menu().
*/
function insecure_menu() {
$items['insecure'] = array( 'title' => 'Insecure Test','page callback' => 'insecure_code',
'access arguments' => array('access content'),);
return $items;}
/*
* Menu callback, called when user goes to http://example.com/?q=insecure
*/
function insecure_code($type = 'story') {
$output = "Searching for nodes of type: $type <br/>";
$query = db_select('node', 'n');
$query->fields('n', array('title'));
$query->condition("n.type", $type);
$result = $query->execute();
$items = array(); foreach($result as $row) {
$items[] = $row->title;}
if (sizeof($items) > 0) {
$output .= theme('item_list', array('items' => $items));
} else {
$output .= "No nodes were found of type $type";}
return $output;}
Going to http://example.com/insecure works as expected. We get the SQL and then a list of stories, as shown in Figure 21-2.
Figure 21-2. Simple listing of story node titles
Note how the programmer cleverly gave the insecure_code() function a $type parameter that defaults to 'story'. This programmer is taking advantage of the fact that Drupal’s menu system forwards additional path arguments automatically as parameters to call backs, so http://example.com/ insecure/page will get us all titles of nodes of type 'page', as shown in Figure 21-3.
Figure 21-3. Simple listing of page node titles
The situation can still be improved, however. In this case, the URL should contain only members of a finite set; that is, the node
types on our site. We know what those are, so we should always confirm that the user supplied value is in our list of
known values. For example, if we have only the page and article node types enabled, we should attempt to proceed only ifwe have been given those types in the URL. Let’s add some code to check for that:
function insecure_code($type = 'article') {
$types = node_type_get_types(); if (!isset($types[$type])) {
watchdog('security', 'Possible SQL injection attempt!', array(), WATCHDOG_ALERT);
return t('Unable to process request.');
}
$output = "Searching for nodes of type: $type <br/>";
$query = db_select('node', 'n');
$query->fields('n', array('title'));
$query->condition("n.type", $type);
$result = $query->execute();
$items = array(); foreach($result as $row) {
$items = array(); foreach($result as $row) {
$items[] = $row->title;}
if (sizeof($items) > 0) {
$output .= theme('item_list', array('items' => $items));
} else {
$output .= "No nodes were found of type $type";}
return $output;}
Here we’ve added a check to make sure that $type is one of our existing node types, and if the check fails, a handy
warning will be recorded for system administrators. There are more problems, though. The SQL does not distinguish between published and unpublished nodes, so even titles of unpublished nodes will show up. Plus, node titles are user-submitted
data, so they need to be sanitized before output. But as the code currently stands, it just gets the titles from the database and displays them. Let’s fix these problems.
function insecure_code($type = 'article') {
$types = node_type_get_types(); if (!isset($types[$type])) {
watchdog('security', 'Possible SQL injection attempt!', array(), WATCHDOG_ALERT);
return t('Unable to process request.');
}
$output = "Searching for nodes of type: $type <br/>";
$query = db_select('node', 'n');
$query->fields('n', array('title'));
$query->condition("n.type", $type);
$query->condition("n.status", 1);
$result = $query->execute();
$items = array(); foreach($result as $row) {
$items[] = check_plain($row->title);
}
if (sizeof($items) > 0) {
$output .= theme('item_list', array('items' => $items));
} else {
$output .= "No nodes were found of type $type";
}
return $output;}
Now only unpublished nodes will show up, and all the titles are run through check_plain() before being displayed. We’ve
also removed the debugging code. This module has come a long way! But there’s still a security flaw. Can you see it?
If not, read on.
Không có nhận xét nào:
Đăng nhận xét