Extending searches

There are two ways to add new types of content to Enano's search engine (which we'd like to think is better than most). There's the easy way that works for 90% of cases, and the difficult way for the other 10%. If you decide that you have to use the difficult way, you are designing your application wrong. ;)

Are you doing too much work?

If you design your plugin around adding pages into Enano's pages and page_text tables, you don't need to register a new search handler - your plugin's content will be indexed automatically. Yes, very few plugins actually do this because they want to store additional data beyond what Enano tables are designed for, but it does let you avoid having to write your own editor, history code, and what have you.

The easy way

The simple way to add new content to searches is with the register_search_handler function. It takes one argument, an associative array, that requires several values:

  • table: The database table to be searched
  • titlecolumn: The primary text or char column to be searched
  • uniqueid: A template string that will be used to identify one result page. This is usually in the format of ns={namespace};pid={page_id}. Variables ({variable}) are assigned based on column names, so if you have a column named text, for example, that will be available in this string as {text}.
  • linkformat: An associative array that tells Enano how to link to your results. It requires two values, page_id and namespace, and each should be formatted using a template string.

There are also several optional values you can include:

  • datacolumn: If your table contains a separate column for text, set it here. Matches in the title score higher than matches in the text.
  • additionalcolumns: An array of additional columns you would like the engine to select. These will be available to the template strings as well as your formatting callback.
  • additionalwhere: Extra WHERE clauses to add to the search in case you have a column for "private" or non-searchable content, or some other case where certain potential results should be excluded.
  • formatcallback: This can be either a function (which will be called with one argument, an associative array containing the current result row) or a template string. If it's a template string, use the same format of variable as above to write the HTML for the body of a single result. If it's a callback, it should return a string containing HTML.
  • resultnote: If this is specified, it is prepended to the title in smaller text. This lets you tell the user that the result is a specific type of content. Recommended. Use the format "[Content type]".

Example

This is a working example of how to add usernames into the search engine. (e.g. searching for a username will include links to user pages in the results.)

register_search_handler(array(
  // Which database table to search
  'table' => 'users',
  // Primary column to search
  'titlecolumn' => 'username',
  // Uniquely identify each result
  'uniqueid' => 'ns=User;cid={username}',
  // Extra columns to include - we include user_id to allow fetching the rank
  'additionalcolumns' => array('user_id'),
  // Say what kind of result it is
  'resultnote' => '[Member]',
  // How to format links
  'linkformat' => array(
      'page_id' => '{username}',
      'namespace' => 'User'
    ),
  // Function to call to format each result
  'formatcallback' => 'format_user_search_result',
));
 
function format_user_search_result($row)
{
  global $session, $lang;
  // $row is an array containing the values user_id and username.
  // get user rank
  $rankdata = $session->get_user_rank(intval($row['user_id']));
  $rankspan = '<span style="' . $rankdata['rank_style'] . '">' . $lang->get($rankdata['rank_title']) . '</span>';
  if ( empty($rankdata['user_title']) )
  {
    // show the rank in the result text
    return $rankspan;
  }
  else
  {
    // user has a user title, include that too
    return '"' . htmlspecialchars($rankdata['user_title']) . "\" (<b>$rankspan</b>)";
  }
}

The difficult way

This way provides some - but not much - more control over how your results work. To use this, you have to hook into search_global_inner and change around a lot of variables. There's no example or documentation here because extending the search engine this way is WAY too much work to be practical. However, this is how the easy way works on the backend. So if you want an example for extending the search engine this way, just have a look at the function inject_custom_search_results() in includes/search.php.

Use the function perform_search() to perform a search of the site.

array perform_search(string $query, array &$warnings, bool $match_case, array &$word_list)

The return from perform_search() is an array containing a data structure with the following elements:

  • page_name: The friendly title of the page
  • page_text: A snippet of text from the page containing highlighted search terms.
  • page_length: The size of the page in bytes
  • score: Relevance score for the result; results always arrive from the search system sorted in descending order of relevance
  • page_id: Page ID of the page the result is for
  • namespace: Namespace of the page the result is for
  • page_note: A snippet of additional text about the page, typically what kind of page it is (e.g. special page, gallery image, etc). It's usually displayed in smaller text to the left of page_name.
  • url_highlight: The URL of the result, but highlighted, and thus unsuitable for use in a link. Result URLs also have things like the session ID trimmed out; they're supposed to look pretty, not be exactly correct.
  • zero_length: If this is set (it isn't always) and it's true, don't display the page_text because there's nothing there.

How the engine works

Enano's engine scores results based on where search terms appear and how many terms appear in each result. Each search term that appears in the title of a result gets 1.5 points. Each term that appears in the body or content gets 1 point. Thus, if a user searches for two words, and one result has one word in the title and another has both words in the body, the second result will score higher and be sorted first.

Internally, this is accomplished by keeping an associative array for each identified result. Each time a search term is found, the score is incremented. At the end of the search, the array is sorted and then used to pull results and format them accordingly.

Categories: (Uncategorized)