Popcorn Sutton Posted April 17, 2014 Posted April 17, 2014 I need help and I don't have the brainpower to research anymore (for today at least... that typically happens when I'm focusing all my mental energy on technicalities). I need to know 1. how to (here's an example) download the pdf automatically (using python) from a link, (i.e. http://www.xxx.gov/18746/app=get%20inmates.pdf) and 2. how to convert the pdf to a file that can be processed by OCR (such as pytesser).
Sensei Posted April 18, 2014 Posted April 18, 2014 Google bankrupted? Query: "python download file" http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python Query: "pytesser" https://code.google.com/p/pytesser/ Answer: it needs image. Such as PNG, JPG etc. Why on earth do you want to OCR PDF when you can get raw text without OCR? Query: "python pdf reader" http://stackoverflow.com/questions/25665/python-module-for-converting-pdf-to-text
Popcorn Sutton Posted April 18, 2014 Author Posted April 18, 2014 I've seen all of those resources by now 😥 I might have to try alternative search engines. Beaheaheahhhhh That was a joke... unless it isn't
Popcorn Sutton Posted April 21, 2014 Author Posted April 21, 2014 After three days of deliberation on this one, I think that I may have found a solution. I really hate knowing that it took me literally 15 hours to find it, but I'm winning. If anyone happens to stumble across this thread and needs the same help as I needed, here is the simplest answer (because pdfMiner and PyPDF are extremely complex with a very steep learning curve). - https://pypi.python.org/pypi/slate
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now