Techblog Careesma Just another Network site

16Feb/11Off

HTML Purifier custom attribute validation

HTML Purifier is a library written in PHP that filters malicious code (better known as XSS) in HTML inputs. It will make sure your documents are standards compliant and it will allow just HTML tags based in a tag/attribute white-list you define.

At careesma we have changed our previous  HTML input validation based in DTDs to HTML Purifier. The main reason as you can imagine is that 'hand' written DTDs are complex to write, read and maintain.

HTML Purifier has a default validation for every type of attribute (e.g. unique for id, URI for href, etc) but allow you to change the default behavior using the function HTMLDefinition::addAttribute(). This function has three parameters: the tag name, the attribute name and a class that tells HTML purifier how to validate the attribute value. There are several classes already defined in HTML purifier that let you, for example, define an enumeration of possible values as shown in the following example.

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', implode('a[href|target]'));
$def = $config->getHTMLDefinition(true);
$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
  array('_blank','_self','_target','_top')
));
$purifier = new HTMLPurifier($config);


In this example HTML purifier will just allow links with 'href' and 'target' '_blank','_self','_target','_top'.

This opens the possibility to add your custom attribute validation by writing your own AttrDef class inhereted from HTMLPurifier_AttrDef . The following sniped of code shows how to require HTML classes to start with a given prefix.

class CustomAttrDef extends HTMLPurifier_AttrDef {
  var $prefix;
  public function __construct($prefix) {
    $this->prefix = $prefix;
  }
  public function validate($string, $config, $context) {
    $matches = array();
    return preg_match("/s*{$this->prefix}.*/", $string, $matches) ? $string : false;
  }
}
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', 'a[target]');
$def = $config->getHTMLDefinition(true);
$def->addAttribute('a', 'target', new CustomAttrDef('prefix_'));
$purifier = new HTMLPurifier($config);


In this particular example HTML like '<div class="prefix_nav">some text</div>' will remain after using purifier but no other classes without the prefix. Following this example you can virtually do any kind of validation.

For more information about how to use HTML Purifier look to the project documentation.

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay

About Javier Lopez

No description. Please complete your profile.
Tagged as: , Comments Off
Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

No trackbacks yet.