Saturday, July 31, 2010    
Home My Books Blog ColdFusion About Me Back    

Calendar
<< Jul 2010 >>
S M T W T F S
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
             

Search

Categories
 • Acrobat (5) [RSS]
 • Adobe (96) [RSS]
 • AdobeMAX06 (45) [RSS]
 • AdobeMAX07 (59) [RSS]
 • AdobeMAX08 (66) [RSS]
 • AdobeMAX09 (39) [RSS]
 • AdobeMAX10 (7) [RSS]
 • AIR (233) [RSS]
 • Appearances (198) [RSS]
 • Books (78) [RSS]
 • CFEclipse (15) [RSS]
 • ColdFusion (1409) [RSS]
 • ColdFusion Builder (9) [RSS]
 • Data Services (36) [RSS]
 • Fish Tank (5) [RSS]
 • Flash (248) [RSS]
 • Flex (513) [RSS]
 • Home Automation (5) [RSS]
 • Jobs (119) [RSS]
 • JRun (14) [RSS]
 • Labs (47) [RSS]
 • LiveCycle (35) [RSS]
 • MAX (238) [RSS]
 • Mobile (138) [RSS]
 • Regular Expressions (18) [RSS]
 • RIA (21) [RSS]
 • SQL (42) [RSS]
 • Stuff (543) [RSS]
 • Tips (CF Studio) (80) [RSS]
 • Tips (CF) (795) [RSS]
 • Tips (Dreamweaver) (91) [RSS]
 • Tips (Flex Builder) (2) [RSS]
 • Using CF (164) [RSS]

Other BLOGs
 • Charlie Arehart
 • Lee Brimelow
 • Ray Camden
 • Christophe Coenraets
 • Sean Corfield
 • Mihai Corlan
 • Cornel Creanga
 • Mark Doherty
 • John Dowdell
 • Danny Dura
 • Enrique Duvos
 • Steven Erat
 • Kevin Hoyt
 • Serge Jespers
 • Adam Lehman
 • Duane Nickull
 • Miti Pricope
 • Andrew Shorten
 • Ryan Stewart
 • James Ward
 • Greg Wilson
 • Full As A Goog

RSS Feeds
 • Feed
 • Subscribe

Join my mailing list and find out about new books and other topics of interest.

Thoughts, ideas, tips, musings, and pontifications (not necessarily in that order) by Ben Forta ...
NOTE: This is my personal blog, and the opinions and statements voiced here are my own.

Viewing By Entry / Main
November 25, 2003

RegEx to Match a URL

Earlier today I needed URL input validation (my form prompted for a URL, I needed to validate that what was provided was a syntactically valid URL). URL validation is actually rather complicated, but the following regular expression worked for me:

https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?
This one allows both http and https, supports an optional port, and allows paths and query strings too. Oh, it also works with both JavaScript (and can thus be used with ) and CFML.

Comments
Great! I only just discovered the RegEx support in CF5 and above and this will come in very handy!

Thanks Ben!
# Posted By Peter Tilbrook | 11/25/03 9:18 PM
A minor nit-pick: it doesn't allow an optional username / password in the URL:

http://uname:passw@domain.com/
# Posted By seancorfield | 11/26/03 1:28 AM
How about using it in the comments text to change urls into actual &lt;a&gt; links that open in a new window?
# Posted By Sam Farmer | 11/26/03 1:42 PM
How to parse a URL:

(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?

Captures end up being the scheme, authority, path, query and fragment.
# Posted By Iain | 11/28/03 9:55 PM
adsda
# Posted By daa | 5/7/04 2:19 AM
I think this one will work better

[-\w\.]+://([-\w\.]+)+(:\d+)?(:\w+)?(@\d+)?(@\w+)?([-\w\.]+)(/([\w/_\.]*(\?\S+)?)?)?
# Posted By Mohamed ElZahaby | 8/20/06 1:44 PM
Mohamed, your Regex is excellent, but I've found a little hole in the thing: it matches even if you're doubling the protocol, e.g.:

http://http://somesite.com.br/somewhere

The worse part is that, in CFMX7 when you try to cfhttp it, the Java engine throws the exception:

500 HTTPClient Internal Error: unexpected exception 'HTTPClient.ParseException: somesite.com.br is an invalid port number' HTTPClient Internal Error: unexpected exception 'HTTPClient.ParseException: somesite.com.br is an invalid port number'

It's a Java Exception, not even a CFMX exception. I'll try to fix this down, but i'm not as skilled as you guys here are in regexes....
# Posted By Alexei Ramone | 10/19/06 1:52 PM
MY MISTAKE: the bug is triggered when parsing

http://http//:some.damn.site.com.br

the problem is Java is thinking some.damn.site.com.br is the protocol (because its after the second :) and the regex is still validating it.
# Posted By Alexei Ramone | 10/19/06 2:49 PM
Very good regex, I have been looking for this to replace the original Internet URL validation from the built-in that provided with .Net ver 2. This is very good replacement because it can suport port number
# Posted By Steve | 11/1/07 9:33 AM
The regular expression validates URL like the following:
htp://www.mysite.com, or xyz://www.mysite.com
The prefix protocol is wrong.

In this way you can fix the problem:
((ht|f)tp(s?)\:\/\/|~/|/)?([-\w\.]+)+(:\d+)?(:\w+)?(@\d+)?(@\w+)?([-\w\.]+)(/([\w/_\.]*(\?\S+)?)?)?

I have another question: it accept URL with double '.'. Can it possible having an URl with double '.'?
Thx.

Alessandro.
# Posted By Alessandro | 5/12/08 4:25 AM
Thank you very much!

I'd been beating my head against the wall for the better part of a night trying to get a RegEx recipe for URLs from O'Reilly's pocket reference to work in REFindNoCase (but it had pound signs and other chars to which CF objected). Wish I'd found your RegEx first! :)
# Posted By Jonah Chanticleer | 1/31/09 7:19 AM

  © Copyright 1997-2009 Ben Forta, All Rights Reserved