January 5, 2011
I tinkered with PostgreSQL’s full-text-search (FTS) capabilities and I’m pretty impressed. On a table with 1.2 million rows of user profile information, I can do a token-based FTS search for usernames in under 90 milliseconds on a small-ish AWS instance. Unfortunately, the FTS token system doesn’t recognize MixedCaseUsernames, or numbers between words, as word separators. I did, however, fall quickly in love with the marker tag system which tells Postgres to prepend and append HTML 4 bold tags around matching portions of text.
After a little help from the #postgres IRC channel on freenode, I also had an all-SQL approach to finding username portions based on 3-character tokens generated from the entire username. Yes, this index took a while to build, but that’s not the point.
The point is that now I have a stored procedure that does a 2-part lookup, one using PostgreSQL’s token search, the other doing an ILIKE comparison based on the 3-character token bits, finds a unique list of matches via DISTINCT, limits the count, and then loops through the result set adding my own bold tags to the results, and returns the entire set in under 100ms.
I was able to get Nginx up and running with the ngx_postgres module, and it works well for our application, but some caching around the results would be nice. I’m waiting to hear back from the author of ngx_postgres and the author of the srcache-nginx-module project to see if they have additional insights as to why using srcache is trying to force an SSL connection to Postgres. Once we can add that caching layer, this setup should scream.