Nginx and PostgreSQL full text search

The blog is currently being ported from WordPress to over 12 years of static pages of content. If there's an article missing that you're hoping to see, please contact me and let me know and I'll prioritize getting it online.

January 5, 2011

I tinkered with PostgreSQL’s full-text-search (FTS) capabilities and I’m pretty impressed. On a table with 1.2 million rows of user profile information, I can do a token-based FTS search for usernames in under 90 milliseconds on a small-ish AWS instance. Unfortunately, the FTS token system doesn’t recognize MixedCaseUsernames, or numbers between words, as word separators. I did, however, fall quickly in love with the marker tag system which tells Postgres to prepend and append HTML 4 bold tags around matching portions of text.

After a little help from the #postgres IRC channel on freenode, I also had an all-SQL approach to finding username portions based on 3-character tokens generated from the entire username. Yes, this index took a while to build, but that’s not the point.

The point is that now I have a stored procedure that does a 2-part lookup, one using PostgreSQL’s token search, the other doing an ILIKE comparison based on the 3-character token bits, finds a unique list of matches via DISTINCT, limits the count, and then loops through the result set adding my own bold tags to the results, and returns the entire set in under 100ms.

I was able to get Nginx up and running with the ngx_postgres module, and it works well for our application, but some caching around the results would be nice. I’m waiting to hear back from the author of ngx_postgres and the author of the srcache-nginx-module project to see if they have additional insights as to why using srcache is trying to force an SSL connection to Postgres. Once we can add that caching layer, this setup should scream.