Pro_Drupal7_Development: Building a Custom Search Page [Searching and Indexing Content]

Drupal has the ability to search nodes and usernames out of the box. Even when you develop your own custom node types,Drupal’s search system indexes the content that’s rendered to the node view. For example, suppose you have a recipe node

type withthe fields ingredients and instructions, and you create a new recipe node whose node ID is 22. As long as those

fields are viewable by the administrator when you visit http://example.com/?q=node/22, the search module will index the

recipe node and its additional metadata during the next cron run.

While it would appear at first glance that node searching and user searching would use the same underlying mechanism,

they’re actually two separate ways of extending search functionality. Rather than querying the node table directly for every

search,node searching uses the help of an indexer to process the content ahead of time in a structured format. When a node

search is performed, the structured data is queried, yielding noticeably faster and more accurate results. We’ll get to know the indexer later inthis chapter.

Username searches are not nearly as complex, because usernames are a single field in the database that the search query

checks. Also, usernames are not allowed to contain HTML, so there’s no need to use the HTML indexer. Instead, you can

query theusers table directly with just a few lines of code.

In both of the preceding cases, Drupal’s search module delegates the actual search to the appropriate module. The simple username search can be found in the user_search_execute() function of modules/user/user.module, while the more complex

node search is performed by node_search_execute() in modules/node/node.module. The important point here is that the

search module orchestrates the search but delegates the implementation to the modules that know the searchable content best.

The Default Search Form

You’ll be glad to know the search API has a default search form ready to use (see Figure 13-1). If that interface

works for your needs, then all you need to do is write the logic that finds the hits for the search requested. This search

logic is usually a query to the database.

Figure 13-1. The default user interface for searching with the search API

While it appears simple, the default content search form is actually wired up to query against all the visible elements of the

node content of your site. This means a node’s title, body, additional custom attributes, comments, and taxonomy terms are

searched from this interface.

The Advanced Search Form

The advanced search feature, shown in Figure 13-2, is yet another way to filter search results. It expands on the basic

search form by providing the ability to select the content types to restrict the search to and an easy-to-use interface for

entering words, phrases, and negative search words.

Figure 13-2. The advanced search options provided by the default search form

The default search form can be changed by implementing the search hook in a module, then using hook_form_alter() on theform ID search_form (see Chapter 11) to provide an interface for the user. In Figure 13-2, both of these are happening.

The node module is implementing the search hook to make nodes searchable (see the node_search functions in modules/

node/node.module) and is extending the form to provide an interface (see node_form_search_form_alter() in -modules/node/node.module).

Adding to the Search Form

Let’s look at an example. Suppose we are using path.module and want to enable searching of URL aliases on our site.

We’ll write a short module that will implement Drupal's search hooks to make the aliases searchable and provide an

additional tab in Drupal’s search interface.

Introducing the Search Hooks

There are several hook_search functions that your module may use in Drupal 7.

hook_search_info(): This function allows a module to tell the search module that it wishes to perform searches on content it defines (custom node types, users, or comments for example) when a site search is performed. The values set inthis function define the tab that appears at the top of the search form for the type of content your module searches (e.g., Content, Users, Comments) and the path value appended after ‘/search’ in the url (e.g., /search/node).

hook_search_execute($keys = NULL): This function executes a search for a set of keywords that are entered by the user, andpassed to the function as a string.

hook_search_reset(): This function is called when the search index is going to be rebuilt. This function is used by modules

that also implement hook_update_index(). If your module keeps track of how much of its content is indexed,you’ll want to use this function to reset the module’s counters in preparation for reindexing.

hook_search_status(): This function reports the status of reindexing the content in the database. It returns a value of the

total number of items to index and the number of items left to index.

hook_search_access(): This function allows a module to define permissions for a search tab. If the user does not have the proper permissions, then the tab will not be displayed on the search form.

hook_search_admin(): This function adds elements to the search settings form.

Formatting Search Results with hook_search_page()

If you have written a module that provides search results, you might want to take over the look and feel of the results page

by implementing hook_search_page(). If you do not implement this hook, the results will be formatted by a call to theme_search_results($variables), which has its default implementation in modules/search/search-results.tpl.php.

Do not confuse this with theme_search_result($variables), which formats a single search result and has its default implementation in modules/search/search- result.tpl.php.

Making Path Aliases Searchable

Let’s begin our example. We’ll be implementing a search option that allows site visitors to paths by implementing several

search hooks. Create a new folder named pathfinder at sites/all/modules/custom, and create the files shown in Listings

13-1 and 13-2 with the new directory.

Listing 13-1. pathfinder.info

name = Pathfinder

description = Gives administrators the ability to search URL aliases.

package = Pro Drupal Development core = 7.x

dependencies[] = path files[] = pathfinder.module

Listing 13-2. pathfinder.module

<?php

/**

* @file

* Search interface for URL aliases.

Leave pathfinder.module open in your text editor; you’ll continue to work with it. The next function to implement is

hook_search_info(). This hook places the tab at the top of the search form for our search of URL aliases.

/**

* Implements hook_search_info()

function pathfinder_search_info() { return array('title' => 'URL Aliases',);}

The next function checks to see if the person has the correct permissions to search URL aliases.

/**

* Implements hook_search_access().

function pathfinder_search_access() {

return user_access('administer url aliases');}

And finally we’ll use the hook_search_execute() function to perform the search and return the results.

/**

* Implements hook_search_execute().

function pathfinder_search_execute($keys = NULL) {

$find = array();

$query = db_select('url_alias')->extend('PagerDefault');

$query->fields('url_alias', array('source', 'alias'));

$query->condition('alias', '%' . db_like($keys) . '%', 'LIKE');

$result = $query->limit(15)->execute();

foreach ($result as $alias) {

$find[] = array('title' => $alias->alias, 'link' => url($alias->source,É

array('absolute' => TRUE)));}

return $find;}

When the search API invokes hook_search_info(), it’s looking for the name the menu tab should display on the generic

search page (see Figure 13-3). In our case, we’re returning “URL aliases.” By returning the name of the menu tab,

the search API wires up the link of the menu tab to a new search form.

Figure 13-3. By returning the name of the menu tab from hook_search_info(), the search form becomes accessible.

hook_search_execute() is the workhorse part of Drupal's search hooks. It is invoked when the search form is submitted, and its job is to collect and return the search results. In the preceding code, we query the url_alias table, using the search terms

submitted from the form. We then collect the results of the query and send them back in an array. Theresults are formatted by the search module and displayed to the user, as shown in Figure 13-4.

Figure 13-4. Search results are formatted by the search module.

Using the Search HTML Indexer

So far, we’ve examined how to interact with the default search form by providing a simple implementation of hook_search_execute(). However, when we move from searching a simple VARCHAR database column with LIKE to seriously

indexing web sitecontent, it’s time to outsource the task to Drupal’s built-in HTML indexer.

The goal of the indexer is to efficiently search large chunks of HTML. It does this by processing content when cron is called(via http://example.com/cron.php). As such, there is a lag time between when new content is searchable and how often cronis scheduled to run. The indexer parses data and splits text into words (a process called tokenization), assigning scores to

each token based on a rule set, which can be extended with the search API. It then stores this data in the database, and when a search is requested, it uses these indexed tables instead of the node tables directly.

When to Use the Indexer

Indexers are generally used when implementing search engines that evaluate more than the standard “most words matched” approach. Search relevancy refers to content passing through a (usually complex) rule set to determine ranking within an index.

You’ll want to harness the power of the indexer if you need to search a large bulk of HTML content. One of the greatest

benefits in Drupal is that blogs, forums, pages, and so forth are all nodes. Their base data structures are identical, and this

common bond means they also share basic functionality. One such common feature is that all nodes are automaticallyindexed

if a search module is enabled; no extra programming is needed. Even if you create a custom node type, searching of that

content is already built in, provided that the modifications you make show up in the node when it is rendered.

How the Indexer Works

The indexer has a preprocessing mode where text is filtered through a set of rules to assign scores. Such rules include

dealing with acronyms, URLs, and numerical data. During the preprocessing phase, other modules have a chance to add logicto this process inorder to perform their own data manipulations. This comes in handy during language-specific tweaking,

as shown here using the contributed Porter- Stemmer module:

• resumé -> resume (accent removal)

• skipping -> skip (stemming)

• skips -> skip (stemming)

Another such language preprocessing example is word splitting for the Chinese, Japanese, and Korean languages to ensure thecharacter text is correctly indexed.

After the preprocessing phase, the indexer uses HTML tags to find more important words (called tokens) and assigns them

adjusted scores based on the default score of the HTML tags and the number of occurrences of each token. These scores

will be used to determine the ultimate relevancy of the token. Here’s the full list of the default HTML tag scores (they are defined in search_index()):

'h1' => 25,

'h2' => 18,

'h3' => 15,

'h4' => 12, 'h5' => 9,

'h6' => 6, 'u' => 3, 'b' => 3, 'i' => 3,

'strong' => 3, 'em' => 3,

'a' => 10

Let’s grab a chunk of HTML and run it through the indexer to better understand how it works.

Figure 13-5 shows an overview of the HTML indexer parsing content, assigning scores to tokens, and storing that information in the database.

Figure 13-5. Indexing a chunk of HTML and assigning token scores

When the indexer encounters numerical data separated by punctuation, the punctuation is removed and numbers alone are indexed. This makes elements such as dates, version numbers, and IP addresses easier to search for. The middle

process in Figure 13-5 shows how a word token is processed when it’s not surrounded by HTML.

These tokens have a weight of 1. The last row shows content that is wrapped in an emphasis (<em>) tag. The formula for

determining the overall score of a token is as follows:

Number of matches x Weight of the HTML tag

It should also be noted that Drupal indexes the filtered output of nodes, so, for example, if you have an input filter set to

automatically convert URLs to hyperlinks, or another filter to convert line breaks to HTML breaks and paragraph tags, the

indexer sees this content with all the markup in place and can take the markup into consideration and assign scores

accordingly. A greater impact of indexing filtered output is seen with a node that uses the PHP evaluator filter to generate

dynamic content. Indexing dynamic content could be a real hassle, but because Drupal’s indexer sees only the output of

content generated by the PHP code, dynamic content is automatically fully searchable.

When the indexer encounters internal links, they too are handled in a special way. If a link points to another node, then the

link’s words are added to the target node’s content, making answers to common questions and relevant information easier to

find.There are two ways to hook into the indexer:

• hook_node_update_index($node): You can add data to a node that is otherwise invisible in order to tweak search

relevancy. You can see this in action within the Drupal core comments, which technically aren’t part of the node object but

should influence the search results. The Comment module also implements this hook. This is, however, sneaky. It uses the

comment_update_index function to set a limit on how many comments should be indexed. Thus it’s just a bitof a hack of the API.

• hook_update_index(): You can use the indexer to index HTML content that is not part of a node using hook_update_

index(). For a Drupal core implementation of hook_update_index(), see node_update_index() inmodules/node/node.

module.

Both of these hooks are called during cron runs in order to index new data. Figure 13-6 shows the order in which these

hooks run.

Figure 13-6. Overview of HTML indexing hooks

We’ll look at these hooks in more detail in the sections that follow.

Adding Metadata to Nodes: hook_node_update_index()

When Drupal indexes a node for searching, it first runs the node through node_view(). Modules can decide how the data will be displayed, indicating whether the content should be indexed. For example, assume we have a node with an ID of 26. The parts of thenode that are visible when viewing the URL http://example.com/?q=node/26 are what the indexer also sees.

What if we have a custom node type that contains hidden data that needs to influence search results? A good example of where we might want to do this is with book.module. We could index the chapter headings along with each child page to boost

the relevancy of those children pages.

/**

* Implements hook_node_update_index().

function book_boost_node_update_index($node) {

// Book nodes have a parent link ID attribute.

// If it's nonzero we can have the menu system retrieve

// the parent's menu item which gives us the title.

if ($node->type == 'book' && $node->book['plid']) {

$item = menu_link_load($node->book['plid']); return '<h2>'. $item['title'] .'</h2>';

}}}

Notice that we wrapped the title in HTML heading tags to inform the indexer of a higher relative score value for this text.

Indexing Content That Isn’t a Node: hook_update_index()

If you need to wrap the search engine around content that isn’t made up of Drupal nodes, you can hook right into the

indexer and feed it any textual data you need, thus making it searchable within Drupal. Suppose your group supports a legacy application thathas been used for entering and viewing technical notes about products for the last several years. For political reasons, you cannot yet replace it with a Drupal solution, but you’d love to be able to search those technical notes from

within Drupal. No problem.Let’s assume the legacy application keeps its data in a database table called technote. We’ll createa short module that will send the information in this database to Drupal’s indexer using hook_update_index() and present

search results using the search hooks.

Create a folder named legacysearch inside sites/all/modules/custom. If you want to have a legacy database to play with, create a file named legacysearch.install, and add the following contents:

<?php

/**

* Implements hook_install().

function legacysearch_install() {

$fields = array('id' => 1, 'title' => 'Web 1.0 Emulator', 'note' => '<p>This handyÉ product lets you emulate theblink tag but in hardware...a perfect gift.</p>',É 'last_modified' => 1172502517);

db_insert('technote')->fields($fields)->execute();

$fields = array('id' => 2, 'title' => 'Squishy Debugger', 'note' => '<p>FullyÉ functional debugger inside a

squishy gel case. The embedded ARM processor heatsÉ up...</p>', 'last_modified' => 1172502517);

db_insert('technote')->fields($fields)->execute();}

/**

* Implements hook_uninstall().

function legacysearch_uninstall() { drupal_uninstall_schema('legacysearch');

}

/**

* Implements hook_schema().

function legacysearch_schema() {

$schema['technote'] = array('description' => t('A database with some example records.'), 'fields' => array( 'id' =>array('type' => 'serial', 'not null' => TRUE,'description' => t("The tech note's primary ID."),),'title' => array(

'type' => 'varchar', 'length' => 255,'description' => t("The tech note's title."),),'note' => array( 'type' => 'text',

'description' => t('Actual text of tech note.'),),'last_modified' => array( 'type' => 'int','unsigned' => TRUE,

'description' => t('Unix timestamp of last modification.'),),),'primary key' => array('id'),);

return $schema;}

This module typically wouldn’t need this install file, since the legacy database would already exist; we’re just using it to

make sure we have a legacy table and data to work with. You would instead adjust the queries within the module to

connect to yourexisting non-Drupal table. The following queries assume the data is in a non-Drupal database with the

database connection defined in the $databases array in settings.php. Next, add sites/all/modules/custom/legacysearch/

legacysearch.info with the following content:

name = Legacy Search

description = Example of indexing/searching external content with Drupal.

package = Pro Drupal Development core = 7.x

files[] = legacysearch.install files[] = legacysearch.module

Finally, add sites/all/modules/custom/legacysearch/legacysearch.module along with the following code:

<?php

/**

* @file

* Enables searching of non-Drupal content.

Go ahead and keep legacysearch.module open in your text editor, and we’ll add hook_update_index(), which feeds the

legacy data to the HTML indexer. You can now safely enable your module after creating these files. You will also need to

go toadmin/config/search/settings and enable legacy_search as one of the active search modules and after saving, click

the Re-index site to rebuild the indexes including the legacy search.

/**

* Implements hook_search_info()

function legacysearch_search_info() { return array('title' => 'Tech Notes',);

}

/**

* Implements hook_search_reset()

function legacysearch_search_reset() { variable_del('legacysearch_cron_last_change');

variable_del('legacysearch_cron_last_id'); return;

}

/**

* Shutdown function to make sure we remember the last element processed.

function legacysearch_update_shutdown() { global $last_change, $last_id;

if ($last_change && $last_id) { variable_set('legacysearch_cron_last_change', $last_change);

variable_set('legacysearch_cron_last_id', $last_id);}}

/**

* Implements hook_update_index().

function legacysearch_update_index() {

global $last_change, $last_id; register_shutdown_function('legacysearch_update_shutdown');

$last_id = variable_get('legacysearch_cron_last_id', 0);

$last_change = variable_get('legacysearch_cron_last_change', 0);

db_set_active('legacy');

$result = db_query("SELECT id, title, note, last_modified FROM {technote} WHERE id >É:last_id OR last_modified > :last_change",array(':last_id' => $last_id, ':last_change' => $last_change));

db_set_active('default'); foreach($result as $data) {

$last_change = $data->last_modified;

$last_id = $data->id;

$text = '<h1>' . check_plain($data->title) . '</h1>' . $data->note; search_index($data->id,

'technote', $text); variable_set('legacysearch_cron_last',

$data->last_modified); variable_set('legacysearch_cron_last_id', $data->id);

}}

/**

* Implements hook_search_execute().

function legacysearch_search_execute($keys = NULL) {

// Set up a mock URL to embed in the link so that when the user clicks it takes themÉto the legacy site

$legacy_url = 'http://technotes.example.com';

// Set up and execute the query

$query = db_select('search_index', 'i')->extend('SearchQuery')->extend('PagerDefault');

$query->join('technote', 't', 't.id = i.sid');

$query->searchExpression($keys, 'technote');

// If there weren't any results then return a blank result set if (!$query->executeFirstPass()) {

return array();

}

// If the first pass did return at least one record then execute the search

$found = $query->limit(10)->execute();

// Now create the search results output foreach ($found as $item) {

// First get the values from the legacy table to display in search results db_set_active('legacy');

$note = db_query("SELECT * FROM {technote} where id = :sid", array(':sid' =>É

$item->sid)); db_set_active('default');

// Format the search results

$results[] = array('link' => url($legacy_url . 'note.pl', array('query' => $item->sid, 'absolute' => TRUE)),

'type' => t('Note'), 'title' => $note->title,

'date' => $note->last_modified, 'score' => $item->score,'snippet' => search_excerpt($keys, $note->note));

}

return $results;}

After cron has run and the information has been indexed, the technical notes will be available to search, as shown in Figure 13-7. They will be indexed inside Drupal, but legacysearch_search() will return search results that are built from

(and point to) the legacy system.

Figure 13-7. Searching an external legacy database

Summary

After reading this chapter, you should be able to

• Customize the search form.

• Understand how to use the search hook.

• Understand how the HTML indexer works.

• Hook into the indexer for any kind of content.

Pro_Drupal7_Development

Thứ Hai, 2 tháng 6, 2014

Building a Custom Search Page [Searching and Indexing Content]

Summary

Không có nhận xét nào:

Đăng nhận xét