• Activity
  • Votes
  • Comments
  • New
  • All activity
  • Showing only topics with the tag "pdf". Back to normal view
    1. Best solution to extract PDF data?

      Hi folks-- To those more knowledgeable than I am: What would be the best local solution to extract numerical data from a batch of PDF file reports? The values I want are interspersed among word...

      Hi folks--

      To those more knowledgeable than I am:

      What would be the best local solution to extract numerical data from a batch of PDF file reports? The values I want are interspersed among word processor formatted tables and irrelevant text. The text and table formatting are (nearly) identical across reports. The data I want vary across reports. The PDFs are not of images...I can select and copy text without OCR. I have thousands to process, and the data themselves are confidential (I have clearance) and cannot be shared. I can use Windows or Linux but no MacOS.

      I am technically inclined, so I bashed my head against regular expressions just enough to use notepad++ to find and delete most of the irrelevant stuff and make a CSV, but it's a hacky, imprecise method and not nearly automated enough for batches. For reference, I don't code for a living or even as a hobby, but I use R and bash, am familiar with IDEs, and can follow pseudocode well enough to edit and use scripts.

      Any thoughts? Thanks in advance!

      24 votes
    2. Help with converting PDF to Excel and back to PDF?

      I may be asking a dumb question or going about this wrong but I'm not sure what to do here. So right now, I receive an estimation from one company in a PDF. It has a bunch of fields such as...

      I may be asking a dumb question or going about this wrong but I'm not sure what to do here.

      So right now, I receive an estimation from one company in a PDF. It has a bunch of fields such as customer name, product, address, etc. Then I type that data and put it into Excel, where I add additional data that I have. From there I have a second PDF which has form fields that I fill with the data of the Excel spreadsheet.

      My problem is with the first PDF that I get from this other company, unless I am doing something wrong I am unable to get that first PDF to show the data as fields. If I convert the first PDF into an Excel then the table data is very messy.

      The amount of typing or copy and paste that it's not hard but it is time consuming. What is the best way for me to go about doing this? I've been Googling things but I'm not sure the right words of action I'm looking for.

      I hope this all makes sense, but if not please ask questions and I'll do my best to try and clarify further.

      12 votes