Saturday, February 02, 2008

Parsing HTTP headers in Factor with multi-assocs

The implementation of setting and parsing http headers in Factor has previously used a hashtable with a single key/value pair. However, this is broken because certain fields can be sent twice, e.g. set-cookie. The new implementation is a hashtable with keys/vectors to store multiple values for the same key.

I originally tried to make this obey the assoc protocol so that you could convert from a hashtable of vectors back to any type of assoc (hashtable/alist/AVL tree/etc) but this turned out to be a really bad idea because not only was it not useful, but it breaks the semantics of the assoc protocol if set-at inserts an element instead of sets it.

So the implementation is in assocs.lib as a few helper words:

: insert-at ( value key assoc -- )
[ ?push ] change-at ;

: peek-at* ( key assoc -- obj ? )
at* dup [ >r peek r> ] when ;

: peek-at ( key assoc -- obj )
peek-at* drop ;

: >multi-assoc ( assoc -- new-assoc )
[ 1vector ] assoc-map ;

: multi-assoc-each ( assoc quot -- )
[ with each ] curry assoc-each ; inline

: insert ( value variable -- ) namespace insert-at ;


Of course, set-at and at still set and access the values, but there are a couple new utility words. The insert-at word has the same stack effect as set-at but pushes a value instead of setting it. peek-at will give you the last value set for a given key, and this is the standard way of accessing values when you only care about the last one.

To turn an assoc into a multi-assoc, call >multi-assoc. To iterate over all the key/value pairs, use multi-assoc-each.

The insert word is for use with the make-assoc word, which executes inside a new namespace and outputs the variables you set as a hashtable.

Here's an example of what the headers look like for a website:

( scratchpad ) USE: http.client "amazon.com" http-get drop .
H{
{ "connection" V{ "close" } }
{ "content-type" V{ "text/html; charset=ISO-8859-1" } }
{ "server" V{ "Server" } }
{ "x-amz-id-2" V{ "L0oid1yo1Z6cuq+VgwWCv0G/UdPov/0v" } }
{ "x-amz-id-1" V{ "15CPXN68HXB35FXE62CX" } }
{
"set-cookie"
V{
"skin=noskin; path=/; domain=.amazon.com; expires=Sun, 03-Feb-2008 04:57:59 GMT"
"session-id-time=1202544000l; path=/; domain=.amazon.com; expires=Sat Feb 09 08:00:00 2008 GMT"
"session-id=002-3595241-4867224; path=/; domain=.amazon.com; expires=Sat Feb 09 08:00:00 2008 GMT"
}
}
{ "vary" V{ "Accept-Encoding,User-Agent" } }
{ "date" V{ "Sun, 03 Feb 2008 04:57:59 GMT" } }
}


I have normalized the keys by converting them to all lower case. For some reason, Amazon sends two headers as Set-Cookie and the last one as Set-cookie, which is pretty weird.

Since the prettyprinter outputs valid Factor code, you can copy/paste the above headers into a Factor listener and run some of the multi-assoc words on them.

1 comment:

Anonymous said...

Actually the RFC says that headers than can be set twice can also be concatenated with the "," separator[1] so in theory you could still use a plain old hashtable :)

[1]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2