Geeks and Bloggers Portal

About | Contact | Help

Home

Articles

Links

IT Freebies

Main Menu

Online Now

1 Member(s)
1 Guest(s)
20 Robot(s)
Log in to see who's on.

Most ever on: 794
Membership: 97

Home » Articles » Hosting Webmasters Domains

Avoiding XSS holes in sites that allow HTML

Published on 01/24/12 at 14:54:12 EST by GentleGiant

Hosting Webmasters Domains

09/05/2020: Misconceptions on HTML5

29/01/2012: How To Remain Logged In - The Remember Me Box

29/01/2012: 50 Top Linux/Unix Sysadmin Tutorials

25/01/2012: Make Your Links Change Colour When the Mouse Hover

25/01/2012: Create 3D Buttons Using CSS

For sites where users are allowed to use HTML, the goal is not to escape the input, but to restrict what HTML features can be used.

The level of restriction depends on the site. A site like MySpace may decide to let users customize the appearance of their pages as much as they want. In contrast, a forum will probably limit users to P, BLOCKQUOTE, lists, simple inline styles, and perhaps images.

The naive approach of "stripping tags" using regular expressions often misses things because it interprets them differently than browsers do. For example, the regular expression <.*?> matches nothing in "<script src='http://evil.com/evil.js' </script", leaving your Internet Explorer and Firefox visitors vulnerable (see bug 226495 and for details about "half-tag" parsing). Gerv has an example involving unterminated entities. RSnake maintains an extensive list of XSS vectors that naive filtering may miss.

TODO: go through RSnake's list and make sure my advice covers everything.

The best approach is to parse the input HTML on the server, keeping only tags, attributes, and attribute values you want to allow. Upon serialization, the result will be "well-formed" HTML that browsers will parse the same way your server did.

Things you should ensure are never allowed in user-submitted HTML, to protect the accounts of visitors who use Firefox and IE:

javascript:, vbscrypt:, and data: URLs in links, images, anywhere.
<scrypt> tags, with or without src attributes.
Event attributes (on*), which contain scripts.
-moz-binding: or behavior: CSS properties inside <style> elements or style attributes.
HTML is that is not "well-formed" -- you can't be sure how quirky browsers will parse it. (Example: <b <i>Foo)

The above list might not be complete and it is safer to use whitelists than blacklists. For example, only allow http:, https:, and ftp: links rather than allowing all links other than javascript:, vbscript:, and data:.

If you must allow unsanitized, untrusted HTML to be part of your site, ensure that those pages are not on the same hostname as where other users log in. (webmail, web hosts, attachments in a bug-tracking system, Google cache) (see Gerv's proposal)

TODO: test various browsers' interpretation of setting "document.domain" to figure out exactly what "not on the same hostname" needs to mean
0 comments, (699 reads) All Articles by, GentleGiant

Printer Friendly version - Avoiding XSS holes in sites that allow HTML

The comments are owned by the poster. We aren't responsible for its content.
Only registered members may comment on articles.

No comments so far.

The comments are owned by the poster. We aren't responsible for its content.
Only registered members may comment on articles.

Recent Discussions

About | Contact | Help | Recommend | Statistics

RSS Feed Avoiding XSS holes in sites that allow HTML

This site is part of the Detroit Metro Area Networks

*******************************