mikey2k Posted January 27, 2021 Posted January 27, 2021 Hello guys. I am in need of some help. I have a PDF file that contains 100 tests (they are divided into 3 parts: A, B and C). How can I create another PDF that contains only the A part from each test? I found some tutorials with Python, but the thing is I don't really know how to use it. Thank you in advance :) I'm sorry for the grammatical mistakes :) #stillLearning
iNow Posted January 27, 2021 Posted January 27, 2021 Can you copy/paste the A parts from each PDF into a single Word document, then Save As and change file type to PDF from Word?
Sensei Posted January 27, 2021 Posted January 27, 2021 (edited) PDF contains text and images. Start from making sure your text is really text, not image. e.g. some people scan paper documents and output from scanner (images) are put as is inside of the PDF document. To handle images there is needed OCR. Completely different procedure. Also text can be in several columns. Attempt to OCR will result in having couple words from each column mixed each row! Find some example here and copy and paste it for a start: https://www.google.com/search?q=python+extract+text+from+pdf 4 hours ago, iNow said: Can you copy/paste the A parts from each PDF into a single Word document, then Save As and change file type to PDF from Word? This is what ordinary layman would do. Programmers write scripts which will automatically extract needed data. Manual extraction of data from thousands files would take months or years of work. In some not computerised countries and companies, people still work that way in offices. That's bizarre. And results in waste of human resources, ineffectiveness, inproductivity of company, office or government. Inability to compete with the real world were such job is done by programmers. Programmers wanting to extract data from documents have different than amount of information, problems like damage of character encodings (it doesn't bother much UK, US, Australia and Canada programmers, but the rest of world indeed), text in scanned images, text in columns, incorrect recognition of the letter by OCR etc. etc. Edited January 27, 2021 by Sensei
A_curious_Homosapien Posted August 31, 2021 Posted August 31, 2021 (edited) Well the script is pretty simple for this, try reaching out someone who have some experience in this (it can be any language, python c c++ c# or any). He/She would do the job in no time. Edited August 31, 2021 by A_curious_Homosapien
iNow Posted August 31, 2021 Posted August 31, 2021 I’m sure our OP who made this post over 7 months ago and who hasn’t posted a single time since is grateful for your reply.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now