Jump to content

Recommended Posts

Posted

Hello guys. I am in need of some help. I have a PDF file that contains 100 tests (they are divided into 3 parts: A, B and C). How can I create another PDF that contains only the A part from each test? I found some tutorials with Python, but the thing is I don't really know how to use it. 
Thank you in advance :) I'm sorry for the grammatical mistakes :) #stillLearning

Posted

Can you copy/paste the A parts from each PDF into a single Word document, then Save As and change file type to PDF from Word?

Posted (edited)

PDF contains text and images. Start from making sure your text is really text, not image. e.g. some people scan paper documents and output from scanner (images) are put as is inside of the PDF document. To handle images there is needed OCR. Completely different procedure.

Also text can be in several columns.

Attempt to OCR will result in having couple words from each column mixed each row!

Find some example here and copy and paste it for a start:

https://www.google.com/search?q=python+extract+text+from+pdf

4 hours ago, iNow said:

Can you copy/paste the A parts from each PDF into a single Word document, then Save As and change file type to PDF from Word?

This is what ordinary layman would do. Programmers write scripts which will automatically extract needed data. Manual extraction of data from thousands files would take months or years of work. In some not computerised countries and companies, people still work that way in offices. That's bizarre. And results in waste of human resources, ineffectiveness, inproductivity of company, office or government. Inability to compete with the real world were such job is done by programmers.

Programmers wanting to extract data from documents have different than amount of information, problems like damage of character encodings (it doesn't bother much UK, US, Australia and Canada programmers, but the rest of world indeed), text in scanned images, text in columns, incorrect recognition of the letter by OCR etc. etc.

Edited by Sensei
  • 7 months later...
Posted (edited)

Well the script is pretty simple for this, try reaching out someone who have some experience in this (it can be any language, python c c++ c# or any). He/She would do the job in no time.

Edited by A_curious_Homosapien
Posted

I’m sure our OP who made this post over 7 months ago and who hasn’t posted a single time since is grateful for your reply. 

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.