-
9 votes
-
Paged Out! #5 - hacker zine release
7 votes -
Best solution to extract PDF data?
Hi folks-- To those more knowledgeable than I am: What would be the best local solution to extract numerical data from a batch of PDF file reports? The values I want are interspersed among word...
Hi folks--
To those more knowledgeable than I am:
What would be the best local solution to extract numerical data from a batch of PDF file reports? The values I want are interspersed among word processor formatted tables and irrelevant text. The text and table formatting are (nearly) identical across reports. The data I want vary across reports. The PDFs are not of images...I can select and copy text without OCR. I have thousands to process, and the data themselves are confidential (I have clearance) and cannot be shared. I can use Windows or Linux but no MacOS.
I am technically inclined, so I bashed my head against regular expressions just enough to use notepad++ to find and delete most of the irrelevant stuff and make a CSV, but it's a hacky, imprecise method and not nearly automated enough for batches. For reference, I don't code for a living or even as a hobby, but I use R and bash, am familiar with IDEs, and can follow pseudocode well enough to edit and use scripts.
Any thoughts? Thanks in advance!
24 votes -
New experimental evidence shows lack of employment effects of guaranteed income
20 votes -
Internet Archive loses appeal in Hachette v. Internet Archive
69 votes -
Grokking KOReader
25 votes -
Valve handbook for new employees — first edition
38 votes -
Evaluating the significance of San Lorenzo Village, a mid-20th century suburban community
4 votes -
Impacts Project
8 votes -
Climate disinformation researcher Geoffrey Supran's presentation to the European Parliament about Exxon's propaganda campaign
44 votes -
“Upload moderation” undermines end-to-endencryption: A statement from Meredith Whittaker, Signal president
28 votes -
Paper showcasing a simulation of gravitational waves produced by a warp drive
6 votes -
PayPal USD (PYUSD) on Solana
3 votes -
The Last Messiah by Peter W. Zapffe: An important pessimist and antinatalist essay
2 votes -
Zilog discontinues production of original Z80 processor after forty-eight years
28 votes -
webtoon-dl: a cli for downloading webtoons as pdfs
17 votes -
Yuzu, popular Nintendo Switch emulator, settles with Nintendo for $2.4m and halts development and distribution indefinitely
76 votes -
Sampling: What Nyquist didn’t say, and what to do about it
10 votes -
White House urges use of type safe and memory safe programming languages and hardware
38 votes -
Nobody ever gets credit for fixing problems that never happened
32 votes -
The Great Automatic Grammatizator by Roald Dahl (1954)
21 votes -
Feminism versus multiculturalism [an argument against feminism and multiculturalism being 'incompatible']
5 votes -
Designing a Framework for Measuring Inference Generation and Communicative Efficacy During Board-game Play
16 votes -
Consider the lobster
32 votes -
Fighting climate change via personal banking
12 votes -
2023 GCHQ Christmas Challenge
15 votes -
Reddit moderators of r/law and r/scotus filed an amicus brief in US Supreme Court first amendment case Moody v NetChoice LLC
62 votes -
The Columbian Exchange: A History of Disease, Food, and Ideas
10 votes -
Denmark leads the Women Peace and Security Index 2023/24, scoring more than three times higher than Afghanistan at the bottom of the scale
14 votes -
The Techno Optimist Manifesto by Marc Andreessen, redacted by Grosser
48 votes -
Biking the goods: How North American cities can prepare for and promote large-scale adoption of cargo e-bikes
8 votes -
The use of statistics in legal proceedings – a primer for courts
7 votes -
Intelligent traffic control with smart speed bumps
7 votes -
Douglas B. Lenat - The Ubiquity of Discovery
4 votes -
US study: Law abiding immigrants: the incarceration gap between immigrants and the US born 1850-2020
9 votes -
Towards understanding the design of games that aim to unify a player’s physical body and the virtual world
2 votes -
On being a c̵o̵m̵p̵u̵t̵e̵r̵ ̵s̵c̵i̵e̵n̵t̵i̵s̵t̵ human being in the time of collapse
12 votes -
Global Peace Index 2023 saw Iceland remain the most peaceful country in the world, a position it has held since 2008
18 votes -
Help with converting PDF to Excel and back to PDF?
I may be asking a dumb question or going about this wrong but I'm not sure what to do here. So right now, I receive an estimation from one company in a PDF. It has a bunch of fields such as...
I may be asking a dumb question or going about this wrong but I'm not sure what to do here.
So right now, I receive an estimation from one company in a PDF. It has a bunch of fields such as customer name, product, address, etc. Then I type that data and put it into Excel, where I add additional data that I have. From there I have a second PDF which has form fields that I fill with the data of the Excel spreadsheet.
My problem is with the first PDF that I get from this other company, unless I am doing something wrong I am unable to get that first PDF to show the data as fields. If I convert the first PDF into an Excel then the table data is very messy.
The amount of typing or copy and paste that it's not hard but it is time consuming. What is the best way for me to go about doing this? I've been Googling things but I'm not sure the right words of action I'm looking for.
I hope this all makes sense, but if not please ask questions and I'll do my best to try and clarify further.
12 votes -
Control of Computer Pointer Using Hand Gesture Recognition in Motion Pictures
3 votes -
US Department of Justice announces charges against Donald Trump
118 votes -
Social media and youth mental health - The US Surgeon General’s Advisory
5 votes -
CFTC is suing Binance, CEO Changpeng Zhao
4 votes -
My channels were hacked, streamed crypto scams, then deleted last night
12 votes -
The Full January 6th Comittee Report
11 votes -
Discovering Language Model Behaviors with Model-Written Evaluations
4 votes -
That time when my skin changed color: A Spaniard in the USA
10 votes -
Mental health challenges related to neoliberal capitalism in the United States
8 votes -
Analysis by computer science professor shows that "Google Phone" and "Google Messages" send data to Google servers without being asked and without the user's knowledge, continuously
11 votes -
The plot to destroy Ukraine
22 votes