Floppy Disk

The PG Test Software Project - Software/Blog

Security Measures Against XSS Attacks
When Using The HTTP Get Method To Pass Values Between HTML Pages

My Photo

    Hi. This is just a short, introductory article to help you to learn how to code some basic security measures when passing variables between HTML pages using the HTTP Get method, and a query string.

    The technology I am using is HTML5/Canvas, JavaScript, and CSS3, important tools for anyone working on the Web. I assume you have some general programming experience, but very little, if any knowledge of the security risks and questions that arise when you use JavaScript.

    Last week I presented an article on how to use the HTTP Get method, with a query string, to pass variables between web pages. My application involved clicking on a small version of an image on one page, and then heading to a new url, with a larger, formatted version of the same image, and some formatted text if desired. You can look at this article if you missed it, there is a link in the sidebar. To check out the app, just click the Art link in the navigation section above.

    Generally speaking the article was well received with quite a number of views from folks across the globe, which was great! Thanks to all those who took a look! It was a great experience, both writing the article and posting it.

    There was one reader, a security expert, who pointed out that the use of JavaScript had certain security risks. We had a fairly detailed discussion about a topic, XSS Cross Scripting, which was quite interesting. XSS Cross Scripting, for those who may not know, typically refers to vulnerabilities in code that allow a malicious user to inject hostile code in the client's web page to perform unwanted actions.

    Please note, this is not an exhaustive treatise on XSS Cross Scripting, by any means. I will include some links at the end of my article that may give more background. I have close to 30 years of experience in IT, with most of my experience with compiled languages, and backend databases, where security issues are of a very different nature. JavaScript is an interpreted language, and the browser parser can treat all imput the same, so user input can be treated as programmer written code. This gives the environment a great deal of flexibility but it also opens up the proverbial can of worms.

    As it turns out, I found it very difficult to inject malicious code in my script. I tried the suggestions given by the security expert, but the browsers (IE, Chrome, Firefox) would not execute the phony script. (the scripts I used were javascript:alert("XSS"), and <script>alert("XSS")</script> - revealingly I had to escape the script tags here in my html, otherwise, when this page is clicked, the alert "XSS" would appear). Modern browsers have some simple filtering techniques that weed out the obvious attempted attacks injected with user input. However, as we know with crime in general, the bad guys have a habit of trying to stay one step ahead of the detectives. IT is no different.

The source of the vulnerability in my project is the GET with the query parameter:

href="artfullsize.htm?image=artpix/Talk Radio Hostess Signed.png&name=Talk Radio Hostess" target="_blank"

HTML Get Method

    Above find a snippet of code in the calling HTML page, the page you would get if you click the Art link above. If you happen to click the Art link above and go to that page, you will see a series of small images and each image has code like this. When the user clicks the image, the code executes and passes the parameters so the new page can fetch a larger version of the image.

    The vulnerability comes because a malicious user can substitute their own parameters for ?image= and for name=, and form a new url that has script tags and their own code. Essentially this is what is called unsafe user input because the programmer has no control of what the user might enter. And this is where the problem comes from. Anywhere a user can enter their own input, given the nature of the parsing and the interpreter, that input can be code, and that code can be malicious when executed.

    In a case like this, a URL, the malicious coder must somehow get the user to click this link. This is one reason why it is very inadvisable to click links from folks you don't know. This is the first line of defense, user awareness.

    Should the user click a malicious link, modern browsers will weed out simple attacks but may not weed out more sophisticated attacks. That is where this article comes in, to discuss a basic method to provide another line of defense against injected code. The line of defense discussed here will be under the topic of input validation.

    Basically there are two types of validation: blacklisting and whitelisting.

    Blacklisting looks to weed out dangerous words and phrases. This is the type of validation that browsers do. Script tags, javascript, brackets, etc, obvious attempts at injection are filtered. The problem with blacklisting is that the input set is potentially infinite. You can only weed out so much and hackers are constantly trying to come up with new schemes to bypass your filters.

    Whitelisting validates based on what type of input you are expecting, rather than what type of input you don't want. In database validation, this might be a mask, a date mask, a phone mask, etc. The data must be shaped in a certain way and must have a certain type of content.This type of validation is generally considered stronger because the input set is not infinite but finite.

    In this example I am going to show an example of whitelisting. I have two places that might be vulnerable. One is a pathname of where to find the larger image. The other is the name I want to use for the alt and title tags for the larger image. The pathname is a perfect target for really tight input validation. I can create a regular expression that would make it very difficult to bypass and sneak in malicious code. The name parameter is a bit more problematic because it just is text input. However we can create a good regular expression for whitelisting here also.

    At this point, like any other type of development, it is time to do a bit of data analysis. In this case, do I really need the name parameter? If security were not an issue, than why not include it, but given security considerations, I generally should be able to parse out the alt/title tags from the file name. Also, just as a general rule, the fewer parameters you give a hacker, the better it is, the more secure your code can be. I have seen a few advanced examples where hackers have used the second parameter to figure out a way to get around the blacklist filters provided.

    In this particular example, in order to maintain consistency with my original article, I am going to leave both parameters and will use whitelisting for both approaches. In fact the regular expressions will be very similiar. However if you want the tightest security you might just go with one parameter and use JavaScript parsing to form the name from the file path. I will show you how to do that below.

    Now on to the file name. Right now any filename will do, but why not put some rules in place. Let's say I restrict the filename to three words, comprised of letters, seperated by dashes.

    Here then is what we have for the entire path:


    Likewise for the name parameter, which we will use in the title and alt tags we will use:


    We can easily construct a regular expression as a mask for this input, insuring very tight input validation.


    Above is a copy of the code used. Here I have placed the script tags inside the HTML just to make it easier to follow, but in production you might want to place this script in a seperate js file.

    First we create regular expressions for the image (path) and name (alt, title tags) parameters. Then we parse out the path of the image (parm_image), and the name (parm_name). I have added a step to parse out the file name from the path which we can compare to the name parm. As said previously, you can use this parm_image_name instead of the second passed parameter if you like, and only pass one parameter, the image path.

    If the path is valid, if the name is valid, and the parsed image name is the same as the passed name, we have a match, and can now populate the src, alt, and title attributes. If the path is invalid we display a brief message. Of course what you do if you have a failed validity test depends on your requirements.

    If further security is required, you could conceivably encrypt the JavaScript, or the regular expressions. You could also include your own blacklist routines, checking for suspicious words and characters using either regular expressions or JavaScript code. I leave that up to you.


    I hope this short article will help you with your first steps towards protecting your code against injection type client side attacks. The field is large and I have included some links which can give more information. Feel free to use or adapt any code in this article.


Comments and suggestions are always welcome.
Best regards, Phil Gennuso

BTW: If you want to contact me for any reason, including using any material on this site, please email: philgennuso@gmail.com


Some Links (for security reasons these are copy and paste):