Jump to content

Recommended Posts

Posted

I need help and I don't have the brainpower to research anymore (for today at least... that typically happens when I'm focusing all my mental energy on technicalities). I need to know 1. how to (here's an example) download the pdf automatically (using python) from a link, (i.e. http://www.xxx.gov/18746/app=get%20inmates.pdf) and 2. how to convert the pdf to a file that can be processed by OCR (such as pytesser).

Posted

Google bankrupted?

 

Query: "python download file"

http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python

 

Query: "pytesser"

https://code.google.com/p/pytesser/


Answer: it needs image. Such as PNG, JPG etc.


Why on earth do you want to OCR PDF when you can get raw text without OCR?

Query: "python pdf reader"

http://stackoverflow.com/questions/25665/python-module-for-converting-pdf-to-text

Posted

After three days of deliberation on this one, I think that I may have found a solution. I really hate knowing that it took me literally 15 hours to find it, but I'm winning.


If anyone happens to stumble across this thread and needs the same help as I needed, here is the simplest answer (because pdfMiner and PyPDF are extremely complex with a very steep learning curve). - https://pypi.python.org/pypi/slate

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.