Accessing buttons on websites where the url doesn't change (Python)

Popcorn Sutton · April 25, 2014

Here is an example website that I am trying to access with Python automatically. (http://www.waynecounty.com/sheriff/1359.htm)

The problem here is that the url doesn't change at all, which means that it's probably running its own program behind the scenes. I need to automatically detect and click the accept button. Furthermore, at the next part, I need to detect the last name and first name part of the following webpage. From that point, I need to then insert the raw_input('last name: '), raw_input('first name: ') in the appropriate spots. Then, to make it even more complex, I need to click on the more info buttons associated with that particular inmate so I can find the text (which needs to be ordered as well) so I can find out their charges and their bond information which needs to be sent back to the program.

I've tried-

import splinter
import selenium
from splinter import Browser
with Browser() as browser:
	browser.visit('http://www.waynecounty.com/sheriff/1359.htm')
	browser.find_by_name('Accept').click()

Traceback (most recent call last):
  File "<pyshell#14>", line 3, in <module>
    browser.find_by_name('Accept').click()
  File "C:\Python27\lib\site-packages\splinter\element_list.py", line 75, in __getattr__
    self.__class__.__name__, name))
AttributeError: 'ElementList' object has no attribute 'click'

import time

with Browser() as browser:
	browser.visit('http://www.waynecounty.com/sheriff/1359.htm')
	time.sleep(10)
	browser.find_by_name('Accept').click()

	

Traceback (most recent call last):
  File "<pyshell#27>", line 4, in <module>
    browser.find_by_name('Accept').click()
  File "C:\Python27\lib\site-packages\splinter\element_list.py", line 75, in __getattr__
    self.__class__.__name__, name))
AttributeError: 'ElementList' object has no attribute 'click'

from selenium import webdriver

def SearchWayne(url):
	driver = webdriver.PhantomJS()
	driver.set_window_size(1024,768)
	driver.get(url)
	driver.save_screenshot('screen.png')
	sbtn = driver.find_element_by_css_selector('Accept')
	sbtn.click()

SearchWayne('http://www.waynecounty.com/sheriff/1359.htm')

Traceback (most recent call last):
  File "<pyshell#37>", line 1, in <module>
    SearchWayne('http://www.waynecounty.com/sheriff/1359.htm')
  File "<pyshell#36>", line 2, in SearchWayne
    driver = webdriver.PhantomJS()
  File "C:\Python27\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py", line 50, in __init__
    self.service.start()
  File "C:\Python27\lib\site-packages\selenium\webdriver\phantomjs\service.py", line 69, in start
    raise WebDriverException("Unable to start phantomjs with ghostdriver.", e)
WebDriverException: Message: 'Unable to start phantomjs with ghostdriver.' ; Screenshot: available via screen

no dice.

Cap'n Refsmmat · April 25, 2014

The page you linked to is merely loading another page inside an <iframe> tag, so you can load that page directly instead:

http://www.waynecounty.com/DotNetForms/SHRFInmSearch.aspx

Popcorn Sutton · April 25, 2014

Thank you, I was just talking to my boss about this and he said to try and access the source code. I was worried because the url doesn't change. We need to scrape the data of all the inmates in that particular website. On the same topic, I'm going to be faced with another problem. My boss says that the people responsible for this website (http://itasw0aepv01.macombcountymi.gov/jil/faces/InmateSearch.jsp) are a little more attentive to data miners. So, I'm wondering if the same rule applies here.

P.S.: Very surprised to see the admin bump in on this one. I'm honored

Popcorn Sutton · May 6, 2014

I hate to get back to this subject because I've already completed a working program, but I need to know if there's any way I can make it more efficient. Right now, the code is epic. It does what we need it to do, but it takes approximately 6 hours to run through the entire thing. What I'm wondering is if there is a way that we can just access the data directly as opposed to using the actual website and AI to do the scraping.

AtomicMaster · May 6, 2014

That would typically be covered by some sort of an official API, or database access, which you would need to inquire from the websites you are trying to scrape. I would also not tell them that you are currently scraping them...

Popcorn Sutton · May 15, 2014

The page you linked to is merely loading another page inside an <iframe> tag, so you can load that page directly instead:

http://www.waynecounty.com/DotNetForms/SHRFInmSearch.aspx

BTW thank you very much for that comment, it saved my company A LOT of storage.

Sign In

Accessing buttons on websites where the url doesn't change (Python)

Recommended Posts

Popcorn Sutton

Cap'n Refsmmat

Popcorn Sutton

Popcorn Sutton

AtomicMaster

Popcorn Sutton

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information