tagsToLowerCase()
Mike Davidson came to me yesterday with a request for a JavaScript function. Given an HTML string with uppercase tags and attributes and mixed case attribute values and tag contents, return that string with lowercased tags and attributes but leave the attribute values and tag contents alone. (Certain browsers return the innerHTML of elements in this malformed way—which can be a problem if the source is headed to a textarea and ultimately saved to a database.)
He did some serious googling to no avail before calling in The Wolf. Satisfied with the result, he suggested I post my solution.
function tagsToLowerCase(html)
{
html = html.replace(/([a-z])s*(=)s*("|')/gi, '$1$2$3');
if (parts = html.match(/(</?[a-z][a-z0-9]*| [a-z]+=)/gi))
{
for (var i = 0; i < parts.length; i++)
{
var part = parts[i];
html = html.replace(new RegExp(part, 'g'), part.toLowerCase());
};
};
return html;
}
First thing the function does is cinch up any space around attribute equals signs. It then matches the beginning of opening/closing tags and attribute names. It then loops through the matches replacing each in the original html with the lowercased equivalent. The resulting function is quote agnostic.
019 Comments
Forgive my ignorance, but when would this be useful? When working on other people’s bad code?
It would be useful when working with an inline HTML editor that receives its content from an existing element’s innerHTML property. In certain browsers, Safari for example, even if the physical source code is properly formed using lowercase tags and attribute names accessing the innerHTML property of any object will return malformed code with uppercase tags and attribute names.
By the way, it should be known that I keep my Wolf function requests to a minimum these days. For any youngsters in the audience, Googling is a much better way to generally solve problems like these. Maybe my Googling skills are in decline, but I looked for more than an hour and found nothing. I even tried The Dutch Wolf first but he was unsuccessful (something about being drunk and not coherent enough to grapple regexes at the time).
The final loop can be reduced/simplified to:
return html.replace(/(</?[a-z][a-z0-9]*| [a-z]+=)/gi, function($0, $1) { return $1.toLowerCase(); })Not if you want it to work in Safari (which is really what this is all about). Safari replaces the matches with the string value of your anonymous function. Not very helpful.
Hmm.. I was working around the exact same problem (in Safari) and used a php function instead. But I took a completely different route—walking through the tag character by character, keeping track of whether I was in an attribute or not. I also applied
The performance improvement in mine is probably negligible with modern computers. Yours is certainly prettier.
Now I have to think about where that action makes more sense, in the browser or on the server.
Oh come on, I wasn’t drunk. Just very tired, and with somewhat higher levels of caffeine and alcohol running through my veins. And I had a good excuse.
I remember doing something really similar for the purposes of getting the innerHTML to look right. Ultimately, I ended up addapting Wubben’s “importNode” function from his site (takes a reference to an html node):
The results of which were then kicked over into flashvars. It’s purpose was to pass an entire UL navigation structure into flash as proper XML (of which flash could easily deconstruct and build it’s menu dynamically).
Well, my anonymous function works in Safari 2.0.3 and I do not have access to 1.x :]
I don’t know about 1.x but it doesn’t work in 2.0.2 even.
It would seem more practical to do this on the server rather than with JavaScript… but hey, if you say it’s useful…arighty then.
Thanks, thats a handy function to have.
Actually, the first line of this routine cinches up the space around any equals sign followed by a double or single quote, not just those within attributes. So if run this against a code example like:
it’ll cinch the space around that equals sign, too.
Well, Reg Ex can be expensive, and innerHTML is evil. So, how about a pure DOM solution:
Seems a bit snappier than your original for large nodes. Only tested on Firefox (fine) and Safari because that’s what I have handy. Unfortunately, Safari doesn’t understand comment nodes (node.nodeType == 8 or node.nodeName == “#comment”) so on Safari this will strip comments, but who comments their markup ;)
Whoops.. just looked at Jakob’s comment, which is doing pretty much the same thing as mine. Sorry ‘bout that!
I wrote something slightly related in HTML to XHTML.
While that is a great function to use for the task, I agree with Jakob and Jeff. It’s just another reason why innerHTML should not be used. Sure it’s easy and convenient, but isn’t it lazy? And how long will it continue to be around? Mike, is this just another one of those things that you’ll change when it stops working and not do it right the first time around? And please don’t take this as rude, I’m genuinely curious as to why you’re using innerHTML.
Zack:
Ummm, I’m sorry, do I have a reputation for not doing things right that I don’t know about? Or are you just speaking in general terms? I don’t think I’ve ever coded anything that has “stopped working” because of browser evolution.
Whether innerHTML or DOM doesn’t matter so much to me, really. InnerHTML is not going away any time soon… like any time in the next five years at least. It may be “going away” from an “uncool to use” standpoint, but not from an “actually works” standpoint. Part of the reason I needed this code to begin with was to deal with the inadequacies a spellchecking component we just launched. I guarantee you that the spellchecker will be updated/replaced long before innerHTML stops being a viable fix. That said, I do recognize the long-term benefits of DOM methods vs. RegEx methods.
Mike,
Cool. That answers it. Thanks.
And what I was referring to was from your article, March to Your Own Standard where you said:
And no, I didn’t mean that you weren’t doing things “right”. I was just wondering if using innerHTML was another one of those tests (referred to in your article). You do amazing work (which is why I read your blog regularly) and I am just curious as to your reasoning behind it.