HTML Purifier is a library written in PHP that filters malicious code (better known as XSS) in HTML inputs. It will make sure your documents are standards compliant and it will allow just HTML tags based in a tag/attribute white-list you define.
At careesma we have changed our previous HTML input validation based in DTDs to HTML Purifier. The main reason as you can imagine is that 'hand' written DTDs are complex to write, read and maintain.
HTML Purifier has a default validation for every type of attribute (e.g. unique for id, URI for href, etc) but allow you to change the default behavior using the function HTMLDefinition::addAttribute(). This function has three parameters: the tag name, the attribute name and a class that tells HTML purifier how to validate the attribute value. There are several classes already defined in HTML purifier that let you, for example, define an enumeration of possible values as shown in the following example.
In this example HTML purifier will just allow links with 'href' and 'target' '_blank','_self','_target','_top'.
This opens the possibility to add your custom attribute validation by writing your own AttrDef class inhereted from HTMLPurifier_AttrDef . The following sniped of code shows how to require HTML classes to start with a given prefix.
In this particular example HTML like '<div class="prefix_nav">some text</div>' will remain after using purifier but no other classes without the prefix. Following this example you can virtually do any kind of validation.
For more information about how to use HTML Purifier look to the project documentation.