Warning! This version is now obsolete!
Check out the new and improved version (using only Bash built-ins) here!
Here is a command-line (bash) script that uses sed
to split the segments of an URI into usable variables. It also validates the given URI since malformed strings produce the text “ERROR” which can be handled accordingly:
# Assembling a sample URI (including an injection attack) uri_1='http://user:pass@www.example.com:19741/dir1/dir2/file.php' uri_2='?param=some_value&array[0]=123¶m2=\`cat /etc/passwd\`' uri_3='#bottom-left' uri="$uri_1$uri_2$uri_3" # Parse URI op=`echo "$uri" | sed -nrf "uri.sed"` # Handle invalid URI [[ $op == 'ERROR' ]] && { echo "Invalid URI!"; exit 1; } # Execute assignments eval "$op" # ...work with URI components...
Notice the "uri.sed"
file given to sed
?
It is actually responsible for the URI parsing and it contains the required regular expression rules that will produce bash code out of the given URI which, in turn, when executed, will create our final variables to play with:
# initialize
s/[\r\n]+//g; s/`/%60/g; s/"/%22/g; T begin; :begin
# scheme, address, path, query, fragment
s/^(([a-z]+):\/\/)?(([^:\/]+(:[^@\/]*)?@)?[^:\/?]+(:[0-9]+)?)(\/[^?]*)?(\?[^#]*)?(#.*)?$/\
uri_scheme="\2"; uri_address="\3"; uri_path="\7"; uri_query="\8"; uri_fragment="\9"/i
T error
# user, pass, host, port
s/uri_address="(([a-z0-9_.+=-]+)(:([^@]*))?@)?([a-z0-9.-]*)(:([0-9]*))?"/\0; \
uri_user="\2"; uri_pass="\4"; uri_host="\5"; uri_port="\7"/i; T error
# path parts
h; s/.*uri_path="([^"]+)".*/uri_parts=(); \1/
s/\/+([^/]+)/uri_parts[$[${#uri_parts[*]}]]="\1"; /ig; x; G
# query args
h; s/.*uri_query="([^"]+)".*/uri_args=(); \1/
s/[?&]+([^= ]+)(=([^&]*))?/uri_args[$[${#uri_args[*]}]]="\1"; uri_arg_\1="\3"; /ig
x; G
# print
s/\n\ +//g; s/\n//g; p; q
# failure
:error; c ERROR
After the successful execution of this piece of code the following variables will exist in the running environment:
uri_scheme="http" uri_address="user:pass@www.example.com:19741" uri_user="user" uri_password="pass" uri_host="www.example.com" uri_port="19741" uri_path="/dir1/dir2/file.php" uri_parts[0]="dir1" uri_parts[1]="dir2" uri_parts[2]="file.php" uri_query="?param=some_value&array[0]=123¶m2=`cat /etc/passwd`" uri_args[0]="param" uri_args[1]="array[0]" uri_args[2]="param2" uri_arg_param="some_value" uri_arg_array[0]="123" uri_arg_param2="`cat /etc/passwd`" uri_fragment="#bottom-left"
You could play around with it a bit and tell me if you find any problems. Right now it is only a first effort but it could be improved. Cheers!
Wow, thanks for the fast response.
I checked out your newer version and it looks like a much better solution. So I think I’ll just get to work implementing it into my project
(http://thefekete.net/gitweb/?p=gitWebTools.git;a=blob;f=publish;h=b5c4f6dedda2ebf2180a889a937e22878c833449).
I’ll leave any other comments on the new post.
@all: Dan’s comment reminded me that I made a big improvement on this parser a while back so I hurried and posted a new article about it. Check it out!
Hi Dan, thank you very much for your feedback, that’s just what I need to improve on this!
However, I was not able to replicate the error! Please give me the exact URI string that produces the error so I can do some debugging! 🙂
Here’s my uritest script:
Hey, thanks so much for posting this. It’s perfect for my application.
I’m having a problem though with the path parts and query strings sections of uri.sed. If I comment them out `eval $op’ works fine, but if I run it normally I get:
./uritest: array assign: line 1: unexpected EOF while looking for matching `”‘
./uritest: array assign: line 15: syntax error: unexpected end of file
It’s no problem for me, as all i’m looking to do is parse git, ssh and file uri’s (without args) and I don’t need the path parts. But I thought I’d let you know, and I was curious if this is a problem with my setup or what.
My setup:
Ubuntu Server 9.10 amd64
Linux helpcomputer 2.6.28-17-generic #58-Ubuntu SMP Tue Dec 1 21:27:25 UTC 2009 x86_64 GNU/Linux
GNU bash, version 3.2.48(1)-release (x86_64-pc-linux-gnu)
GNU sed version 4.1.5
Thanks again for this post, totaly awesome.
[Update] Moved the [cci_bash]sed[/cci_bash] instructions into a separate file for modularization.
[Edit] Changed parsing of the query args to permit parsing of arguments that have no value assigned to them (e.g. …?arg_with_no_value&…)