Pro_Drupal7_Development: Handling User Input [Writing Secure Code]

When users interact with Drupal, it is typically through a series of forms, such as the node submission form or the comment

submission form. Users might also post remotely to a Drupal-based blog via XML- RPC using the blogapi module

(http://drupal.org/project/blogapi). Drupal’s approach to user input can be summarized as store the original; filter on

output. The database should always contain an accurate representation of what the user entered. As user input is being

prepared to be incorporated into a web page, it is sanitized (i.e., potentially executable code is neutralized).

Security breaches can be caused when text entered by a user is not sanitized and is executed inside your program. This can

happen when you don’t think about the full range of possibilities when you write your program. You might expect users to

enter only standard characters, when in fact they could enter nonstandard strings or encoded characters, such as control

characters. You might have seen URLs with the string %20 in them for example, http://example.com/my%20document.

html. This is a space character that has been encoded in compliance with the URL specification (see www.w3.org/

Addressing/URL/url- spec.html). When someone saves a file named my document.html and it’s served by a web server, the space is encoded. The % denotes an encoded character, and the 20 shows that this is ASCII character 32 (20 is the hexadecimal representation of 32). Tricky use of encoded characters by nefarious users can be problematic, as you’ll see later in

this chapter.

Thinking About Data Types

When dealing with text in a system such as Drupal where user input is displayed as part of a web site, it’s helpful to think

of the user input as a typed variable. If you’ve programmed in a strongly typed language such as Java, you’ll be familiar with

typed variables. For example, an integer in Java is really an integer, and will not be treated as a string unless the programmerexplicitly makes the conversion. In PHP (a weakly typed language), you’re usually fine treating an integer as a string or an

integer, depending on the context, due to PHP’s automatic type conversion. But good PHP programmers think carefully

about types and use automatic type conversion to their advantage. In the same way, even though user input from, say, the Body field of a node submission form can be treated as text, it’s much better to think of it as a certain type of text. Is the user entering plaintext? Or is the user entering HTML tags and expecting that they’ll be rendered? If so, could these tags include harmful tags, such as JavaScript that replaces your page with an advertisement for cell phone ringtones? A page that

will be displayed to a user is in HTML format; user input is in a variety of “types” of textual formats and must be securely

converted to HTML before being displayed. Thinking about user input in this way helps you to understand how Drupal’s text conversion functions work. Common types of textual input, along with functions to convert the text to another format, are

shown in Table 21-1.

Table 21-1. Secure Conversions from One Text Type to Another
Source Format	Target Format	Drupal Function	What It Does
Plain text	HTML	check_plain()	Encodes special characters into HTML entities and validates strings at UTF-8 to prevent cross-site scripting attacks on Internet Explorer 6
HTML text	HTML	filter_xss()	Removes characters and constructs that can trick browsers. Makes sure that all HTML entities are well formed. Makes sure that all HTML tags and attributes are well formed, and makes sure that no HTML tags contain URLs with a disallowed protocol (e.g., Javascript)
Rich text	HTML	check_markup()	Runs text through all enabled filters
Plain text	URL	drupal_encode_path()	Encodes a Drupal path for use in a URL
URL	HTML	check_url()	Strips out harmful protocols, such as javascript:
Plain text	MIME	mime_header_encode()	Encodes non-ASCII, UTF-8 encoded characters

Plain Text

Plain text is text that is supposed to contain only, well, plain text. For example, if you ask a user to type in his or her

favorite color in a form, you expect the user to answer “green” or “purple,” without markup of any kind. Including this

input in another webpage without checking to make sure that it really does contain only plain text is a gaping security hole.

For example, the user might enter the following instead of entering a color:

<img src="javascript:window.location ='<a href="http://evil.example.com/133/index.php?s=11&"> http://evil.example.com/133/index.php?s=11&</a>;ce_cid=38181161'">

Thus, we have the function check_plain() available to enforce that all other characters are neutralized by encoding them as HTML entities. The text that is returned from check_plain() will have no HTML tags of any kind, as they’ve all been

converted toentities. If a user enters the evil JavaScript in the preceding code, the check_plain() function will turn it into thefollowing text, which will be harmless when rendered in HTML:

<img src="javascript:window.location ='<

a href="http://evil.example.com/133/index.php?s=11&">http://evil.example.com/133/index.php?s=11&</a>;ce_cid=38181161'">

HTML Text

HTML text can contain HTML markup. However, you can never blindly trust that the user has entered only “safe”HTML;generally you want to restrict users to using a subset of the available HTML tags. For example, the <script> tag is not one

that you generally want to allow because it permits users to run scripts of their choice on your site. Likewise, you don’t

want users using the <form> tag to set up forms on your site.

Rich Text

Rich text is text that contains more information than plain text but is not necessarily in HTML. It may contain wiki markup, or Bulletin Board Code (BBCode), or some other markup language. Such text must be run through a filter to convert the markup toHTML before display.

URL

URL is a URL that has been built from user input or from another untrusted source. You might have expected the user to

enter http://example.com, but the user entered javascript:runevilJS() instead. Before displaying the URL in an HTML

page, you must run it through check_url() to make sure it is well formed and does not contain attacks.

Using check_plain() and t() to Sanitize Output

Use check_plain() any time you have text that you don’t trust and in which you do not want any markup. Here is a

naïve way of using user input, assuming the user has just entered a favorite color in a text field. The following code is insecure:

drupal_set_message("Your favorite color is $color!"); // No input checking!

The following is secure but bad coding practice:

drupal_set_message('Your favorite color is ' . check_plain($color));

This is bad code because we have a text string (namely the implicit result of the check_plain() function), but it isn’t inside the t() function, which should always be used for text strings. If you write code like the preceding, be prepared for

complaints from angry translators, who will be unable to translate your phrase because it doesn’t pass through t(). You cannot just place variables inside double quotes and give them to t(). The following code is still insecure because no place holder is being used: drupal_set_message(t("Your favorite color is $color!")); // No input checking!

The t() function provides a built-in way of making your strings secure by using a placeholding token with a one-character

prefix, as follows. The following is secure and in good form:

drupal_set_message(t('Your favorite color is @color', array('@color' => $color)));

Note that the key in the array (@color) is the same as the replacement token in the string. This results in a message like

the following:

Your favorite color is brown.

The @ prefix tells t() to run the value that is replacing the token through check_plain().

In this case, we probably want to emphasize the user’s choice of color by changing the style of the color value. This is done using the % prefix, which means “execute -theme('placeholder', $value) on the value.” This passes the value through

check_plain() indirectly, as shown in Figure 21-1. The % prefix is the most commonly used prefix.

The following is secure and good form:

drupal_set_message(t('Your favorite color is %color', array('%color' => $color)));

This results in a message like the following. In addition to escaping the value, theme_placeholder() has wrapped the value

in <em></em> tags.

Your favorite color is brown.

If you have text that has been previously sanitized, you can disable checks in t() by using the ! prefix. For example, the l() function builds a link, and for convenience, it runs the text of the link through check_plain() while building the link. So in thefollowing

example, the ! prefix can be safely used:

// The l() function runs text through check_plain() and returns sanitized text

// so no need for us to do check_plain($link) or to have t() do it for us.

$link = l($user_supplied_text, $path);

drupal_set_message(t('Go to the website !website', array('!website' => $link));

The effect of the @, %, and ! placeholders on string replacement in t() is shown in Figure 21-1. Although for simplicity’s

sake it isn’t shown in the figure, remember that you may use multiple placeholders by defining them in the string and adding

members to the array, for example:

drupal_set_message(t('Your favorite color is %color and you like %food', array('%color'=>$color, '%food' => $food)));

Be especially cautious with the use of the ! prefix, since that means the string will not be run through check_plain().

Figure 21-1. Effect of the placeholder prefixes on string replacement

Using filter_xss() to Prevent Cross-Site Scripting Attacks

Cross-site scripting (XSS) is a common form of attack on a web site where the attacker is able to insert his or her own code into

a web page, which can then be used for all sorts of mischief. Suppose that you allow users to enter HTML on your web site,

expecting them to enter

<em>Hi!</em> My name is Sally, and I...

But instead they enter

Whoops! Again, the lesson is to never trust user input. Here is the function signature of filter_xss():

filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'blockquote', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'))

The filter_xss() function performs the following operations on the text string it is given:

1. It checks to make sure that the text being filtered is valid UTF-8 to avoid a bug with Internet Explorer 6.

2. It removes odd characters such as NULL and Netscape 4 JavaScript entities.

3. It ensures that HTML entities such as & are well formed.

4. It ensures that HTML tags and tag attributes are well formed. During this process, tags that are not on the white list—that is, the second parameter for filter_xss()—are removed. The style attribute is removed, too, because that can interfere

with the layout of a page by overriding CSS or hiding content by setting a spammer’s link color to the background color of the

page. Any attributes that begin with on are removed (e.g., onclick or onfocus) because theyrepresent JavaScript event-handler definitions. If you write regular expressions for fun and can name character codes for HTML entities from memory,

you’ll enjoy stepping through filter_xss() (found in modules/filter/filter.module)and its associated functions with a debugger.

5. It ensures that no HTML tags contain disallowed protocols. Allowed protocols are http, https, ftp, news, nntp, telnet, mail

to, irc, ssh, sftp, and webcal. You can modify this list by setting the filter_allowed_protocols variable.For example, you could restrict the protocols to http and https by adding the following line to your settings.php file (see the comment about variable overrides in the settings.php file):

$conf = array('filter_allowed_protocols' => array('http', 'https'));

Here’s an example of the use of filter_xss() from modules/aggregator/aggregator.pages.inc. The aggregator module deals with potentially dangerous RSS or Atom feeds. Here the module is preparing variables for use:

/**

* Safely render HTML content, as allowed.

* @param $value

* The content to be filtered.

* @return

* The filtered content.

function aggregator_filter_xss($value) {

return filter_xss($value, preg_split('/\s+<|>/', variable_get('aggregator_allowed_html_tags',

'<a> <b> <br> <dd> <dl> <dt> <em> <i> <li> <ol>

<p> <strong> <u> <ul>'), -1, PREG_SPLIT_NO_EMPTY));}

Note the call to aggregator_filter_xss(), which is a wrapper for filter_xss() and provides an array of acceptable HTML tags.

Using filter_xss_admin()

Sometimes you want your module to produce HTML for administrative pages. Because administrative pages should be protected byaccess controls, it’s assumed that users given access to administrative screens can be trusted more than regular users. You could

setup a special filter for administrative pages and use the filter system, but that would be cumbersome. For these reasons, the

function filter_xss_admin() is provided. It is simply a wrapper for filter_xss() with a liberal list of allowed tags, including everything

except the <script>, <object>, and <style> tags. An example of its use is in the display of the site mission in a theme:

if (drupal_is_front_page()) {

$mission = filter_xss_admin(theme_get_setting('mission'));}

The site’s mission can be set only from the Configuration -> “Site information” page, to which only the superuser and users with

the “administer site configuration” permission have access, so this is a situation in which the use of filter_xss_admin() is

appropriate.

Handling URLs Securely

Often modules take user-submitted URLs and display them. Some mechanism is needed to make sure that the value the user has

given is indeed a legitimate URL. Drupal provides the check_url() function, which is really just a wrapper forfilter_xss_bad_

protocol(). It checks to make sure that the protocol in the URL is among the allowed protocols on the Drupal site (see step 5 in the earlier section “Using filter_xss() to Prevent Cross-Site Scripting Attacks”) and runs the URL throughcheck_plain().

If you want to determine whether a URL is in valid form, you can call valid_url(). It will check the syntax for http, https, and

ftp URLs and check for illegal characters; it returns TRUE if the URL passes the test. This is a quick way to make sure that

users aren’t submitting URLs with the javascript protocol.

If you’re passing on some information via a URL—for example, in a query string—you can use drupal_encode_path() to

pass along escaped characters. Calling drupal_encode_path() does some encoding of slashes for compatibility with Drupal’s clean

URLs and then calls PHP’s rawurlencode() function. The drupal_encode_path() function is not more secure than calling

rawurlencode() directly, but it is handy for making encoded strings that will work well with Apache’s mod_rewrite module.

Making Queries Secure with db_query()

A common way of exploiting web sites is called SQL injection. Let’s examine a module written by someone not thinking about

security. This person just wants a simple way to list titles of all nodes of a certain type:

/*

* Implements hook_menu().

*/

function insecure_menu() {

$items['insecure'] = array( 'title' => 'Insecure Test','page callback' => 'insecure_code',

'access arguments' => array('access content'),);

return $items;}

/*

* Menu callback, called when user goes to http://example.com/?q=insecure

*/

function insecure_code($type = 'story') {

$output = "Searching for nodes of type: $type <br/>";

$query = db_select('node', 'n');

$query->fields('n', array('title'));

$query->condition("n.type", $type);

$result = $query->execute();

$items = array(); foreach($result as $row) {

$items[] = $row->title;}

if (sizeof($items) > 0) {

$output .= theme('item_list', array('items' => $items));

} else {

$output .= "No nodes were found of type $type";}

return $output;}

Going to http://example.com/insecure works as expected. We get the SQL and then a list of stories, as shown in Figure 21-2.

Figure 21-2. Simple listing of story node titles

Note how the programmer cleverly gave the insecure_code() function a $type parameter that defaults to 'story'. This programmer is taking advantage of the fact that Drupal’s menu system forwards additional path arguments automatically as parameters to call backs, so http://example.com/ insecure/page will get us all titles of nodes of type 'page', as shown in Figure 21-3.

Figure 21-3. Simple listing of page node titles

The situation can still be improved, however. In this case, the URL should contain only members of a finite set; that is, the node

types on our site. We know what those are, so we should always confirm that the user supplied value is in our list of

known values. For example, if we have only the page and article node types enabled, we should attempt to proceed only ifwe have been given those types in the URL. Let’s add some code to check for that:

function insecure_code($type = 'article') {

$types = node_type_get_types(); if (!isset($types[$type])) {

watchdog('security', 'Possible SQL injection attempt!', array(), WATCHDOG_ALERT);

return t('Unable to process request.');

}

$output = "Searching for nodes of type: $type <br/>";

$query = db_select('node', 'n');

$query->fields('n', array('title'));

$query->condition("n.type", $type);

$result = $query->execute();
$items = array(); foreach($result as $row) {

$items[] = $row->title;}

if (sizeof($items) > 0) {

$output .= theme('item_list', array('items' => $items));

} else {

$output .= "No nodes were found of type $type";}

return $output;}

Here we’ve added a check to make sure that $type is one of our existing node types, and if the check fails, a handy

warning will be recorded for system administrators. There are more problems, though. The SQL does not distinguish between published and unpublished nodes, so even titles of unpublished nodes will show up. Plus, node titles are user-submitted

data, so they need to be sanitized before output. But as the code currently stands, it just gets the titles from the database and displays them. Let’s fix these problems.

function insecure_code($type = 'article') {

$types = node_type_get_types(); if (!isset($types[$type])) {

watchdog('security', 'Possible SQL injection attempt!', array(), WATCHDOG_ALERT);

return t('Unable to process request.');

}

$output = "Searching for nodes of type: $type <br/>";

$query = db_select('node', 'n');

$query->fields('n', array('title'));

$query->condition("n.type", $type);

$query->condition("n.status", 1);

$result = $query->execute();

$items = array(); foreach($result as $row) {

$items[] = check_plain($row->title);

}

if (sizeof($items) > 0) {

$output .= theme('item_list', array('items' => $items));

} else {

$output .= "No nodes were found of type $type";

}

return $output;}

Now only unpublished nodes will show up, and all the titles are run through check_plain() before being displayed. We’ve

also removed the debugging code. This module has come a long way! But there’s still a security flaw. Can you see it?

If not, read on.

Pro_Drupal7_Development

Thứ Hai, 16 tháng 6, 2014

Handling User Input [Writing Secure Code]

Making Queries Secure with db_query()

A common way of exploiting web sites is called SQL injection. Let’s examine a module written by someone not thinking about

security. This person just wants a simple way to list titles of all nodes of a certain type:

/*

* Implements hook_menu().

*/

function insecure_menu() {

$items['insecure'] = array( 'title' => 'Insecure Test','page callback' => 'insecure_code',

'access arguments' => array('access content'),);

return $items;}

/*

* Menu callback, called when user goes to http://example.com/?q=insecure

*/

function insecure_code($type = 'story') {

$output = "Searching for nodes of type: $type <br/>";

$query = db_select('node', 'n');

$query->fields('n', array('title'));

$query->condition("n.type", $type);

$result = $query->execute();

$items = array(); foreach($result as $row) {

$items[] = $row->title;}

if (sizeof($items) > 0) {

$output .= theme('item_list', array('items' => $items));

} else {

$output .= "No nodes were found of type $type";}

return $output;}

Going to http://example.com/insecure works as expected. We get the SQL and then a list of stories, as shown in Figure 21-2.

The situation can still be improved, however. In this case, the URL should contain only members of a finite set; that is, the node

Không có nhận xét nào:

Đăng nhận xét