Corelab – MYSql and XML = “oh my” via strange HTML encodes (Unicode)

Posted by admin on Mar 1, 2010 in Uncategorized |

How to Break XML & .Net AppsThe title of this blog post is a bit strange and its a play on “Lions – Tigers and Bears oh my”

Anyway back on track , I really like strange encodings for HTML and the way in which some characters (chars) will get interpreted by a web-app as something else.

It makes my job as a tester much more interesting. Once you know what you are doing and you have a grasp of the basics you can will find that you can detect  defects where other testers would have passed an Application as ready for production.

This issue is undocumented elsewhere on the net as far as I can see and it can easily bring down a large majority of websites. (Major ones). By bring down I mean a Dos on the home page due to non display of content.

Feel free to investigate further in to it if you wish. However please only test it on sites which you have permission to run tests against.

anyway onto the details.

The issue is caused by characters that cannot be displayed in XML. As XML is unable to render the characters it will just error and display a blank screen to all users. (so now imagine if a site allowed users to input comments which were displayed on the front page).

The character in question is  and for this defect to take place a few things are needed. As the title states the site must have a MySQL back-end (millions of those about). It must also be coded in .Net (C# tested but may also affect VB.Net and other .Net languages) and lastly it must save data from a webform or textbox into the DB using CoreLab data connectors, and then display the data to be webpage via XML.

Now usually you won’t be able to enter  into the webapp but don’t worry as you enter it as valid text. (more on that coming up)

To see an example of this happening open NotePad + Microsoft Word, and the HTML Encoder page on my site.¬† Now in notepad type in I’ve visited the test Managers Page and do the same in Microsoft word.

now paste them into the decoder and see the difference.

Notepad will give you %27 and microsoft word will have changed your apostrophe to a curly apostrophe %_u2019 (the underscore needs to be removed but I can’t stop wordpress from encoding without it). I and most likely you may know of this as a simple %19 = .

Now Corelab, .Net – XML and MySql can all handle curly apostrophe’s however if you carriage return and some text on the next line after the curly apostrophe then CoreLab will add in a an “r/n – carriage return”. It seems that in the default installation of Corelab it doesn’t encode chars as UTF8 but as something else. Then in the DB you then get the encoded  which XML cannot cope with as its an invalid HTML char. So when that text which now has an invalid HTML char attempts to get rendered back in XML the XML stream fails and the page will fail to display.


Copyright © 2012 The Test Manager Blog All rights reserved. Theme by Laptop Geek.