Popcorn Sutton Posted April 25, 2014 Posted April 25, 2014 Here is an example website that I am trying to access with Python automatically. (http://www.waynecounty.com/sheriff/1359.htm) The problem here is that the url doesn't change at all, which means that it's probably running its own program behind the scenes. I need to automatically detect and click the accept button. Furthermore, at the next part, I need to detect the last name and first name part of the following webpage. From that point, I need to then insert the raw_input('last name: '), raw_input('first name: ') in the appropriate spots. Then, to make it even more complex, I need to click on the more info buttons associated with that particular inmate so I can find the text (which needs to be ordered as well) so I can find out their charges and their bond information which needs to be sent back to the program. I've tried- import splinter import selenium from splinter import Browser with Browser() as browser: browser.visit('http://www.waynecounty.com/sheriff/1359.htm') browser.find_by_name('Accept').click() Traceback (most recent call last): File "<pyshell#14>", line 3, in <module> browser.find_by_name('Accept').click() File "C:\Python27\lib\site-packages\splinter\element_list.py", line 75, in __getattr__ self.__class__.__name__, name)) AttributeError: 'ElementList' object has no attribute 'click' import time with Browser() as browser: browser.visit('http://www.waynecounty.com/sheriff/1359.htm') time.sleep(10) browser.find_by_name('Accept').click() Traceback (most recent call last): File "<pyshell#27>", line 4, in <module> browser.find_by_name('Accept').click() File "C:\Python27\lib\site-packages\splinter\element_list.py", line 75, in __getattr__ self.__class__.__name__, name)) AttributeError: 'ElementList' object has no attribute 'click' from selenium import webdriver def SearchWayne(url): driver = webdriver.PhantomJS() driver.set_window_size(1024,768) driver.get(url) driver.save_screenshot('screen.png') sbtn = driver.find_element_by_css_selector('Accept') sbtn.click() SearchWayne('http://www.waynecounty.com/sheriff/1359.htm') Traceback (most recent call last): File "<pyshell#37>", line 1, in <module> SearchWayne('http://www.waynecounty.com/sheriff/1359.htm') File "<pyshell#36>", line 2, in SearchWayne driver = webdriver.PhantomJS() File "C:\Python27\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py", line 50, in __init__ self.service.start() File "C:\Python27\lib\site-packages\selenium\webdriver\phantomjs\service.py", line 69, in start raise WebDriverException("Unable to start phantomjs with ghostdriver.", e) WebDriverException: Message: 'Unable to start phantomjs with ghostdriver.' ; Screenshot: available via screen no dice.
Cap'n Refsmmat Posted April 25, 2014 Posted April 25, 2014 The page you linked to is merely loading another page inside an <iframe> tag, so you can load that page directly instead: http://www.waynecounty.com/DotNetForms/SHRFInmSearch.aspx 1
Popcorn Sutton Posted April 25, 2014 Author Posted April 25, 2014 Thank you, I was just talking to my boss about this and he said to try and access the source code. I was worried because the url doesn't change. We need to scrape the data of all the inmates in that particular website. On the same topic, I'm going to be faced with another problem. My boss says that the people responsible for this website (http://itasw0aepv01.macombcountymi.gov/jil/faces/InmateSearch.jsp) are a little more attentive to data miners. So, I'm wondering if the same rule applies here. P.S.: Very surprised to see the admin bump in on this one. I'm honored
Popcorn Sutton Posted May 6, 2014 Author Posted May 6, 2014 I hate to get back to this subject because I've already completed a working program, but I need to know if there's any way I can make it more efficient. Right now, the code is epic. It does what we need it to do, but it takes approximately 6 hours to run through the entire thing. What I'm wondering is if there is a way that we can just access the data directly as opposed to using the actual website and AI to do the scraping.
AtomicMaster Posted May 6, 2014 Posted May 6, 2014 That would typically be covered by some sort of an official API, or database access, which you would need to inquire from the websites you are trying to scrape. I would also not tell them that you are currently scraping them... 1
Popcorn Sutton Posted May 15, 2014 Author Posted May 15, 2014 The page you linked to is merely loading another page inside an <iframe> tag, so you can load that page directly instead: http://www.waynecounty.com/DotNetForms/SHRFInmSearch.aspx BTW thank you very much for that comment, it saved my company A LOT of storage.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now