Earlier today I needed URL input validation (my form prompted for a URL, I needed to validate that what was provided was a syntactically valid URL). URL validation is actually rather complicated, but the following regular expression worked for me:
https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?
This one allows both http and https, supports an optional port, and allows paths and query strings too. Oh, it also works with both JavaScript (and can thus be used with
) and CFML.
Thanks Ben!
http://uname:passw@domain.com/
(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?
Captures end up being the scheme, authority, path, query and fragment.
[-\w\.]+://([-\w\.]+)+(:\d+)?(:\w+)?(@\d+)?(@\w+)?([-\w\.]+)(/([\w/_\.]*(\?\S+)?)?)?
http://http://somesite.com.br/somewhere
The worse part is that, in CFMX7 when you try to cfhttp it, the Java engine throws the exception:
500 HTTPClient Internal Error: unexpected exception 'HTTPClient.ParseException: somesite.com.br is an invalid port number' HTTPClient Internal Error: unexpected exception 'HTTPClient.ParseException: somesite.com.br is an invalid port number'
It's a Java Exception, not even a CFMX exception. I'll try to fix this down, but i'm not as skilled as you guys here are in regexes....
http://http//:some.damn.site.com.br
the problem is Java is thinking some.damn.site.com.br is the protocol (because its after the second :) and the regex is still validating it.
htp://www.mysite.com, or xyz://www.mysite.com
The prefix protocol is wrong.
In this way you can fix the problem:
((ht|f)tp(s?)\:\/\/|~/|/)?([-\w\.]+)+(:\d+)?(:\w+)?(@\d+)?(@\w+)?([-\w\.]+)(/([\w/_\.]*(\?\S+)?)?)?
I have another question: it accept URL with double '.'. Can it possible having an URl with double '.'?
Thx.
Alessandro.
I'd been beating my head against the wall for the better part of a night trying to get a RegEx recipe for URLs from O'Reilly's pocket reference to work in REFindNoCase (but it had pound signs and other chars to which CF objected). Wish I'd found your RegEx first! :)