Jump to content

Recommended Posts

Posted

Here is an example website that I am trying to access with Python automatically. (http://www.waynecounty.com/sheriff/1359.htm)

 

The problem here is that the url doesn't change at all, which means that it's probably running its own program behind the scenes. I need to automatically detect and click the accept button. Furthermore, at the next part, I need to detect the last name and first name part of the following webpage. From that point, I need to then insert the raw_input('last name: '), raw_input('first name: ') in the appropriate spots. Then, to make it even more complex, I need to click on the more info buttons associated with that particular inmate so I can find the text (which needs to be ordered as well) so I can find out their charges and their bond information which needs to be sent back to the program.

 

I've tried-

import splinter
import selenium
from splinter import Browser
with Browser() as browser:
	browser.visit('http://www.waynecounty.com/sheriff/1359.htm')
	browser.find_by_name('Accept').click()

Traceback (most recent call last):
  File "<pyshell#14>", line 3, in <module>
    browser.find_by_name('Accept').click()
  File "C:\Python27\lib\site-packages\splinter\element_list.py", line 75, in __getattr__
    self.__class__.__name__, name))
AttributeError: 'ElementList' object has no attribute 'click'

import time

with Browser() as browser:
	browser.visit('http://www.waynecounty.com/sheriff/1359.htm')
	time.sleep(10)
	browser.find_by_name('Accept').click()

	

Traceback (most recent call last):
  File "<pyshell#27>", line 4, in <module>
    browser.find_by_name('Accept').click()
  File "C:\Python27\lib\site-packages\splinter\element_list.py", line 75, in __getattr__
    self.__class__.__name__, name))
AttributeError: 'ElementList' object has no attribute 'click'

from selenium import webdriver

def SearchWayne(url):
	driver = webdriver.PhantomJS()
	driver.set_window_size(1024,768)
	driver.get(url)
	driver.save_screenshot('screen.png')
	sbtn = driver.find_element_by_css_selector('Accept')
	sbtn.click()

SearchWayne('http://www.waynecounty.com/sheriff/1359.htm')

Traceback (most recent call last):
  File "<pyshell#37>", line 1, in <module>
    SearchWayne('http://www.waynecounty.com/sheriff/1359.htm')
  File "<pyshell#36>", line 2, in SearchWayne
    driver = webdriver.PhantomJS()
  File "C:\Python27\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py", line 50, in __init__
    self.service.start()
  File "C:\Python27\lib\site-packages\selenium\webdriver\phantomjs\service.py", line 69, in start
    raise WebDriverException("Unable to start phantomjs with ghostdriver.", e)
WebDriverException: Message: 'Unable to start phantomjs with ghostdriver.' ; Screenshot: available via screen 



no dice.

Posted

Thank you, I was just talking to my boss about this and he said to try and access the source code. I was worried because the url doesn't change. We need to scrape the data of all the inmates in that particular website. On the same topic, I'm going to be faced with another problem. My boss says that the people responsible for this website (http://itasw0aepv01.macombcountymi.gov/jil/faces/InmateSearch.jsp) are a little more attentive to data miners. So, I'm wondering if the same rule applies here.

 

P.S.: Very surprised to see the admin bump in on this one. I'm honored

  • 2 weeks later...
Posted

I hate to get back to this subject because I've already completed a working program, but I need to know if there's any way I can make it more efficient. Right now, the code is epic. It does what we need it to do, but it takes approximately 6 hours to run through the entire thing. What I'm wondering is if there is a way that we can just access the data directly as opposed to using the actual website and AI to do the scraping.

Posted

That would typically be covered by some sort of an official API, or database access, which you would need to inquire from the websites you are trying to scrape. I would also not tell them that you are currently scraping them...

  • 2 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.