Paperless NGX vs ???
I'm stuck in a bit of a rut for work and trying to help out a department. We're going through a massive cash flow problem that we will see the other side of. There is zero budget for this, but I have spare kit.
The problem is that Legal have pulled all paper archives for all of the projects for the past 20 years of operating and need them digitally stored. Right now, we have a company runner scanning in a document to email, who then saves it off to a folder structure and renames it. The structure is on SharePoint, but the scan is image, not even OCR'd.
I have discovered Paperless NGX and wonder if this would be a better option for fast ingest. I can host on prem easy enough, either on a bare metal or VM. It looks like it can SSOd via many different options.
Any input would be great before I just go ahead and do this. I'm after pros and cons, alternatives, etc.
Thanks.
The awesome-selfhosted list has a document management section if you're looking for alternatives, but honestly I think you should just try Paperless-ngx. It's very good and the UI is excellent.
It takes a little bit of setup and manual work in the beginning before it starts recognizing the metadata and which documents belong to what correspondent/document type but once that's all done it is really nice. I highly recommend starting off uploading just one document at a time and manually entering the metadata, and then after a couple if you can see it automatically get things correct you're probably fine hooking it up with your scanner (which there are a lot of options for).
The documentation is excellent by the way, everything you need is in there.
I'm planning on having it pick up from a SMB share, then mass scanning to it from a Sharp MFD.
I'll take the advice onboard for slowly ingesting at the beginning. Thanks.
Setup Paperless-ngx for my own use awhile ago. It works pretty good but the autocategorization/tagging definitely needs to be trained/monitored the first several documents of a particular type you put into it. Granted my docs tend to be all over the place since it's legal, medical, everything. Your collection presumably has a lot of similar documents which should make it easier for it.
I'm not sure about SSO integration, but don't think any of your other configuration would be difficult. It scans PDFs/images from a folder and runs OCR on them and puts them in a storage location. Whether either of those are a shared folder or network location or what scanner you use shouldn't matter. Paperless is running in a docker on my home server and monitors a folder on my desktop for new documents, that folder is the default save location for my scanner. Scanning and saving a new document automagically gets it dumped into Paperless in a few seconds for a one page pdf.
I have a bunch of paperwork I've been avoiding actually organizing for probably about a decade at this point. Would you say it's worth it for a regular person to set up? And if so, is there a particular scanner you can recommend - especially one that can capture both sides of a page?
Depending on how comfortable you are with Linux and/or Docker, I highly recommend it. I started using it back when it was just Paperless, then through Paperless-ng, and now Paperless-ngx (Just different developers retiring/continuing development). The features added have only made it more and more easy to use. Particularly with some e-mail filters set up to automatically forward things to the email address my daemon monitors, I spent very little time dealing with it in exchange for having digital copies of paperwork that are easy to search and organize.
I run mine on a headless home server that lives in the basement, so my ideal scanning solution was a standalone network scanner to an FTP folder which the daemon monitors. I originally started out using a Brother ADS-2500W, which I really enjoyed and would still be using -- but when my printer died, I opted to replace both my printer and my scanner with an all-in-one device. The two features I would recommend you look for when selecting a scanner are:
I've already got a Linux server running at home. I've never used Docker in this capacity but I don't suspect it should be too difficult given my existing experiences with it. Thanks for the info!
I feel like it was. It's super nice for things to have tags and their text be searchable. It did take some setup and tweaking though.
For a scanner I grabbed an Epson ES-50 because it was ~$50 at BestBuy. It's super barebones but I didn't have much to scan. I'd second Brother for bigger jobs.
That's also my setup, I find no issue in there.
I have my deceased grandmothers genealogy collection, meaning hundreds of pages documents including emails, charts, spreadsheets, blogs and news clippings. And many photos/images. Right now it’s basically filling a 5 foot tall office file cabinet.
Do you think p-ngx would be a good tool for digitizing that kind of thing, with the OCR so that it’s searchable etc? I’ve made a good effort to get a lot of info entered into genealogy software like GRAMPS, but the sheer volume of paper is out of line with the available space in my house (and my overall current interest in the topic).
I think so, but for something that big you would definitely want a good duplex (double-sided) scanner that you can just feed documents. Scan for a few minutes a day while watching a show or something until it's done.
One of the positives of Paperless I think is that it's basically a folder of PDFs / jpegs / whatever file format you give it. The system is doing OCR on top of that and maintaining a database of what is where, but that part is separated from the files. If you find a different solution later that you think might be better or want to hand it off to a library or whatever, there's just a folder you can copy onto a thumbdrive or whatever.
Main thing would be double checking the scans before saving/forgetting about them to make sure they're decent quality.