Blog

25Nov
2003
RegEx to Match a URL

Earlier today I needed URL input validation (my form prompted for a URL, I needed to validate that what was provided was a syntactically valid URL). URL validation is actually rather complicated, but the following regular expression worked for me:

view plain print about
1https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?
This one allows both http and https, supports an optional port, and allows paths and query strings too. Oh, it also works with both JavaScript (and can thus be used with ) and CFML.

Comments (14)



  • Peter Tilbrook

    Great! I only just discovered the RegEx support in CF5 and above and this will come in very handy!

    Thanks Ben!

    #1Posted by Peter Tilbrook | Nov 25, 2003, 09:18 PM
  • seancorfield

    A minor nit-pick: it doesn't allow an optional username / password in the URL:

    http://uname:passw@domain.com/

    #2Posted by seancorfield | Nov 26, 2003, 01:28 AM
  • Sam Farmer

    How about using it in the comments text to change urls into actual <a> links that open in a new window?

    #3Posted by Sam Farmer | Nov 26, 2003, 01:42 PM
  • Iain

    How to parse a URL:

    (?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?

    Captures end up being the scheme, authority, path, query and fragment.

    #4Posted by Iain | Nov 28, 2003, 09:55 PM
  • daa

    adsda

    #5Posted by daa | May 7, 2004, 02:19 AM
  • Mohamed ElZahaby

    I think this one will work better

    [-\w\.]+://([-\w\.]+)+(:\d+)?(:\w+)?(@\d+)?(@\w+)?([-\w\.]+)(/([\w/_\.]*(\?\S+)?)?)?

  • Alexei Ramone

    Mohamed, your Regex is excellent, but I've found a little hole in the thing: it matches even if you're doubling the protocol, e.g.:

    http://http://somesite.com.br/somewhere

    The worse part is that, in CFMX7 when you try to cfhttp it, the Java engine throws the exception:

    500 HTTPClient Internal Error: unexpected exception 'HTTPClient.ParseException: somesite.com.br is an invalid port number' HTTPClient Internal Error: unexpected exception 'HTTPClient.ParseException: somesite.com.br is an invalid port number'

    It's a Java Exception, not even a CFMX exception. I'll try to fix this down, but i'm not as skilled as you guys here are in regexes....

  • Alexei Ramone

    MY MISTAKE: the bug is triggered when parsing

    http://http//:some.damn.site.com.br

    the problem is Java is thinking some.damn.site.com.br is the protocol (because its after the second :) and the regex is still validating it.

  • Steve

    Very good regex, I have been looking for this to replace the original Internet URL validation from the built-in that provided with .Net ver 2. This is very good replacement because it can suport port number

    #9Posted by Steve | Nov 1, 2007, 09:33 AM
  • Alessandro

    The regular expression validates URL like the following:
    htp://www.mysite.com, or xyz://www.mysite.com
    The prefix protocol is wrong.

    In this way you can fix the problem:
    ((ht|f)tp(s?)\:\/\/|~/|/)?([-\w\.]+)+(:\d+)?(:\w+)?(@\d+)?(@\w+)?([-\w\.]+)(/([\w/_\.]*(\?\S+)?)?)?

    I have another question: it accept URL with double '.'. Can it possible having an URl with double '.'?
    Thx.

    Alessandro.

    #10Posted by Alessandro | May 12, 2008, 04:25 AM
  • Jonah Chanticleer

    Thank you very much!

    I'd been beating my head against the wall for the better part of a night trying to get a RegEx recipe for URLs from O'Reilly's pocket reference to work in REFindNoCase (but it had pound signs and other chars to which CF objected). Wish I'd found your RegEx first! :)

    #11Posted by Jonah Chanticleer | Jan 31, 2009, 07:19 AM
  • CoosCoos

    This regex doesn't work for URLs which have hyphens in the filename, such as http://www.mysite.com/this-is-my-page.cfm

    Any ideas how to enhance to handle this?

    #12Posted by CoosCoos | Aug 18, 2010, 08:46 AM
  • Giambattista

    Confirmed. Doesn't work with hyphens.

  • Neil Smith

    Just what I was looking for to valid URL input in a CMS system that I've recently built. Thanks again