(Python) Downloading automatically, converting automatically, and OCR

April 17, 201411 yr

I need help and I don't have the brainpower to research anymore (for today at least... that typically happens when I'm focusing all my mental energy on technicalities). I need to know 1. how to (here's an example) download the pdf automatically (using python) from a link, (i.e. http://www.xxx.gov/18746/app=get%20inmates.pdf) and 2. how to convert the pdf to a file that can be processed by OCR (such as pytesser).

April 18, 201411 yr

Google bankrupted?

Query: "python download file"

http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python

Query: "pytesser"

https://code.google.com/p/pytesser/

Answer: it needs image. Such as PNG, JPG etc.

Why on earth do you want to OCR PDF when you can get raw text without OCR?

Query: "python pdf reader"

http://stackoverflow.com/questions/25665/python-module-for-converting-pdf-to-text

April 18, 201411 yr

Author

I've seen all of those resources by now 😥

I might have to try alternative search engines. Beaheaheahhhhh

That was a joke... unless it isn't

April 21, 201411 yr

Author

After three days of deliberation on this one, I think that I may have found a solution. I really hate knowing that it took me literally 15 hours to find it, but I'm winning.

If anyone happens to stumble across this thread and needs the same help as I needed, here is the simplest answer (because pdfMiner and PyPDF are extremely complex with a very steep learning curve). - https://pypi.python.org/pypi/slate

Archived

This topic is now archived and is closed to further replies.

Go to topic listing

Sign In

(Python) Downloading automatically, converting automatically, and OCR

Featured Replies

Archived

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)