March 7, 2016
(I’ve been asked to write this up several times over the years, so here goes.)
Background
Almost a decade ago, I thought it’d be neat to order a document scanner. Because, organization.
I found out that the IRS would accept printed copies of scanned original documents and realized that I was lugging around a filing cabinet for no reason. A few years later, something magical came to be, and it’s turned into one of my favorite life hacks. Every time I describe it to people they remind me to blog about it. Since I just spent the last few hours scanning almost a year’s worth of piled up paper, I figured I might as well get around to it.
The Scanner
While this particular model isn’t sold any more, I own an Ambir Docketport 465, and whenever it quits working, I’m very likely to cry. They’ve made newer models since but I haven’t considered ordering another one yet. The closest modern equivalent is the Ambir DockerPORT Duplex DP488 which does double-sided scanning, which I admit is a pretty strong selling point for me as well.
I stayed on Windows 7 much longer than I wanted simple because the device wasn’t supported under Windows 8 or 8.1 on my wife’s computer, but one night I thought I’d upgrade my workstation to Windows 10 without remembering that my favorite toy might no longer be supported, but it still works just fine in Windows 10. Perhaps it’s due to the fact that I did a Windows upgrade and my wife’s PC was a fresh install, I don’t know.
The DocketPORT is a USB document scanner, about 11″ long, maybe 1.5″ tall and 2″ wide. It’s powered over USB and comes with a piece of white cardstock with black patterns on it for calibrating the scanner. It comes with software called DocketSCAN II, which by can scan documents in black and white, grayscale, or color, and in various sizes (typical US sizes and a few non-US sizes like A4, A5, A6, B5) into JPEG, TIFF, or PDF format. It only scans a single side, so for multi-sided bills and such there’s a lot of scan/flip/scan motions going on.
Performance and Folder/File Management on the Computer
Scanning something in plain black and white is super fast and the scan results are quite small for files, but contrasts on documents printed with color don’t always fare too well and text printed in color can sometimes get lost completely. Grayscale handles papers with color nicely but takes a few seconds per page and the sizes are a moderate size. Full color scans take many seconds to work through and produce pretty large files. Of course, compression settings, DPI settings, etc can be customized in the DocketSCAN II application.
At first, I scanned everything as a JPEG image, because TIFF images weren’t always well supported (especially natively in a browser), and the PDF version was just a JPEG image inside a PDF. I meticulously set up several folders like “bills” with subfolders for “utilities” or “car”, and other folders for “insurance” where I had subfolders like “car”, “renters”, “life” and so on. Just the folder setup was a chore in itself, but since every file was called something like “DocketSCAN-1234.pdf” there was also lots of manual renaming of files so I could find things quickly. At one point I started writing software to scan the files for fragments of text to do tagging and file renaming, but that was a pain and gave up.
Online Backups
In order to preserve my sanity in case my PC’s hard drive ever failed, I started syncing these scanned files to Dropbox in case my PC crashed, but managing those files on Dropbox wasn’t as elegant as I’d have liked, and sharing the documents wasn’t as easy to let me wife manage them. And whether I wanted it or not, every PC where I had my Dropbox folder sync’d received a copy of every file which meant tons of bandwidth usage whenever a device came online. Yuck. Still, online backups are awesome, and I recommend you use one! (Especially when we did have a critical hard disk failure in the summer of 2014)
But then Google did something awesome, and turned my document scanning into something of envy to those who hear me sing its praises.
Fanboy Alert
By installing the Google Drive desktop application on my PC, and telling DocketSCAN II to save my scanned files in my Google Drive sync folder (instead of Dropbox), the moment DocetScan II saves my file to my local disk, it sends a copy to Google Drive where it’s accessible immediately on all of my devices as I need it. I just tap or click a file and it downloads it. Perfect on-demand file sync.
That’s not the awesome part. Here’s the magic that changed my organization skills forever:
If you upload an image file, or a PDF file containing an image, to Google Drive, Google will attempt to scan it with Optical Character Recognition (OCR) to pick out anything that looks like text. And then it will let you search your Google Drive account for documents containing that text.
It bears repeating
Google Drive scans your images for text, and lets you SEARCH on that text
I no longer need to maintain my elaborate folder setup, or rename files. Instead I have a single folder with hundreds and hundreds of files called “DocketSCAN-1234.pdf” in a numerical sequence. And now I have Google’s search engine at my disposal within Google Drive to find the text of my scanned files.
So now I can search for text like “toyota camry state farm” and see a list of all of our scanned documents with those words.
Okay, so maybe I do need to revisit the file renaming so we don’t have to click on all of those files to see what’s in there, and to clean up duplicate copies.
Yup, Automatic OCR and Search
This was pretty life-changing for me. And now I spend a few hours periodically scanning a bunch of documents in DocketScan II, they get saved to my local drive, Google Drive’s sync software pushes it to Google Drive automatically, where they do their magic, and within about 5-10 minutes, that document is searchable. And then I spend several more hours shredding that paper (most of that time is spent unjamming the shredder).
Granted, I’ve gotten way too lazy at sitting down and actually scanning all those bills, medical forms, mortgage paperwork, etc. on a regular basis. Typically the paper stacks up for several months at a time, but boy is it handy at tax time when you can open Google Drive and search for things like “CIGNA 2015″ and see all of our medical statements instead of keeping 3” of paper in a filing cabinet.