Bash URI parser using SED

Warning! This version is now obsolete!
Check out the new and improved version (using only Bash built-ins) here!

Here is a command-line (bash) script that uses sed to split the segments of an URI into usable variables. It also validates the given URI since malformed strings produce the text “ERROR” which can be handled accordingly:

# Assembling a sample URI (including an injection attack)
uri_1='http://user:pass@www.example.com:19741/dir1/dir2/file.php'
uri_2='?param=some_value&array[0]=123¶m2=\`cat /etc/passwd\`'
uri_3='#bottom-left'
uri="$uri_1$uri_2$uri_3"

# Parse URI
op=`echo "$uri" | sed -nrf "uri.sed"`

# Handle invalid URI
[[ $op == 'ERROR' ]] && { echo "Invalid URI!"; exit 1; }

# Execute assignments
eval "$op"

# ...work with URI components...

Notice the "uri.sed" file given to sed?
It is actually responsible for the URI parsing and it contains the required regular expression rules that will produce bash code out of the given URI which, in turn, when executed, will create our final variables to play with:

# initialize
s/[\r\n]+//g; s/`/%60/g; s/"/%22/g; T begin; :begin

# scheme, address, path, query, fragment
s/^(([a-z]+):\/\/)?(([^:\/]+(:[^@\/]*)?@)?[^:\/?]+(:[0-9]+)?)(\/[^?]*)?(\?[^#]*)?(#.*)?$/\
uri_scheme="\2"; uri_address="\3"; uri_path="\7"; uri_query="\8"; uri_fragment="\9"/i
T error

# user, pass, host, port
s/uri_address="(([a-z0-9_.+=-]+)(:([^@]*))?@)?([a-z0-9.-]*)(:([0-9]*))?"/\0; \
uri_user="\2"; uri_pass="\4"; uri_host="\5"; uri_port="\7"/i; T error

# path parts
h; s/.*uri_path="([^"]+)".*/uri_parts=(); \1/
s/\/+([^/]+)/uri_parts[$[${#uri_parts[*]}]]="\1"; /ig; x; G

# query args
h; s/.*uri_query="([^"]+)".*/uri_args=(); \1/
s/[?&]+([^= ]+)(=([^&]*))?/uri_args[$[${#uri_args[*]}]]="\1"; uri_arg_\1="\3"; /ig
x; G

# print
s/\n\ +//g; s/\n//g; p; q

# failure
:error; c ERROR

After the successful execution of this piece of code the following variables will exist in the running environment:

uri_scheme="http"
uri_address="user:pass@www.example.com:19741"
uri_user="user"
uri_password="pass"
uri_host="www.example.com"
uri_port="19741"

uri_path="/dir1/dir2/file.php"

uri_parts[0]="dir1"
uri_parts[1]="dir2"
uri_parts[2]="file.php"

uri_query="?param=some_value&array[0]=123¶m2=`cat /etc/passwd`"

uri_args[0]="param"
uri_args[1]="array[0]"
uri_args[2]="param2"

uri_arg_param="some_value"
uri_arg_array[0]="123"
uri_arg_param2="`cat /etc/passwd`"

uri_fragment="#bottom-left"

You could play around with it a bit and tell me if you find any problems. Right now it is only a first effort but it could be improved. Cheers!

8 comments

  1. Hi Dan, thank you very much for your feedback, that’s just what I need to improve on this!

    However, I was not able to replicate the error! Please give me the exact URI string that produces the error so I can do some debugging! 🙂

  2. Here’s my uritest script:

    #!/bin/bash
    
    uri=$1
    
    echo $uri
    
    # Parse URI
    op=`echo "$uri" | sed -nrf "uri.sed"`
    
    # Handle invalid URI
    [[ $op == 'ERROR' ]] &
    
    # Execute assignments
    eval "$op"
    
    echo $uri_scheme
    echo $uri_address
    echo $uri_user
    echo $uri_password
    echo $uri_host
    echo $uri_port
    echo $uri_path
    
  3. Hey, thanks so much for posting this. It’s perfect for my application.

    I’m having a problem though with the path parts and query strings sections of uri.sed. If I comment them out `eval $op’ works fine, but if I run it normally I get:

    ./uritest: array assign: line 1: unexpected EOF while looking for matching `”‘
    ./uritest: array assign: line 15: syntax error: unexpected end of file

    It’s no problem for me, as all i’m looking to do is parse git, ssh and file uri’s (without args) and I don’t need the path parts. But I thought I’d let you know, and I was curious if this is a problem with my setup or what.

    My setup:
    Ubuntu Server 9.10 amd64
    Linux helpcomputer 2.6.28-17-generic #58-Ubuntu SMP Tue Dec 1 21:27:25 UTC 2009 x86_64 GNU/Linux
    GNU bash, version 3.2.48(1)-release (x86_64-pc-linux-gnu)
    GNU sed version 4.1.5

    Thanks again for this post, totaly awesome.

Don't keep it to yourself!...