Popcorn Sutton Posted March 29, 2014 Posted March 29, 2014 One question is whether I can use a bot program (data mining) to bypass the whole entering of letters and numbers process that some websites use to make sure that the user is human. I know it's made to prevent that type of activity, but my company really needs to access information that, at this point, only a human can access. We need to use our time efficiently, and at this point our employees are taking anywhere between 10 and 15 minutes just to check two webpages. We're trying to access 5-10 webpages within 30 seconds and some of them require that the user identify the letters and numbers in the image. The other question is this. When you get to a webpage and you want to perform a search, if you're using google it's easy to encode the query into something that you can just tack on to the end of the URL in order to make the search, (https://www.google.com/#q=like+this) but in webpages that deal with criminals and the whole legal system, the url doesn't change. It stays the same. (www.xxxxx.com/sheriff/12931.htm). I need to figure out how to get around these things. Any help will be appreciated (I'm using Python 2.7 for this task)
Greg H. Posted March 29, 2014 Posted March 29, 2014 In order. #1 Yes, you can use bots to defeat captchas, as long as they're not too advanced. I've never done it before, but some googling will probably turn up a solution. #2 What you're talking about is called deep linking. Unfortunately, many database driven sites use asynchronous ajax calls to dynamically update the HTML on the fly, so those other URL's don't actually exist - it mostly depends on how the site transfers data inputs for the queries. Using the URL has disadvantages, mostly in the maximum length of the query string that can be used, so it's not that useful for sites that offer lots of search features or that can generate long parameter strings. Unless the site artificially supports deep linking to the database results, you're out of luck. However, there are programs you can use to automate filling in the forms when you need to run a search repeatedly. These applications are mostly used to automate qa testing, so you might take a look at what's available in that realm. Also, I am not sure why you got two down votes on your post, so I gave you a plus. Enjoy. 1
Popcorn Sutton Posted March 30, 2014 Author Posted March 30, 2014 (edited) Thanks for that, these people here think that it's funny how I have a negative reputation so they make a point of making it worse when they can lol, I used to worry about it but it's just become a joke by this point. Thank you for your response, I'm hoping to get a little more in depth about this but I don't have the time ATM so I'll have to save it for later. I don't like the idea of using the entire operating system to pull it off, I like doing the back end work because you can fly through text like it's nothing. It's extremely efficient and can probably save a lot of time. Edited March 30, 2014 by Popcorn Sutton
Sensei Posted March 30, 2014 Posted March 30, 2014 You got negative (not from me), because you want to abuse websites that try to fight with guys like you. And you are asking us for helping you abusing their systems on our forum... Whatever you will do, it will be just temporary. They can change font that's rendering texts and distortions within couple minutes. 2
Endy0816 Posted March 30, 2014 Posted March 30, 2014 (edited) Depending on your time requirements you can also, cheaply, hire out other people to do it via services like Amazon's Mechanical Turk(mturk). Edited March 30, 2014 by Endy0816
Popcorn Sutton Posted March 30, 2014 Author Posted March 30, 2014 I know about that but I like doing it myself because I know how to do these things
Endy0816 Posted March 30, 2014 Posted March 30, 2014 I know about that but I like doing it myself because I know how to do these things I mean hire them to enter the captchas and scrape/input whatever information you need.
Sensei Posted March 30, 2014 Posted March 30, 2014 (edited) I know about that but I like doing it myself because I know how to do these things Rather because you are greedy, and want software to do work for you for free, instead of letting people earn some money. Edited March 30, 2014 by Sensei
Popcorn Sutton Posted March 30, 2014 Author Posted March 30, 2014 I'm preventing crime by decreasing fugitive rates. I like the idea of cheap supervision because I absolutely hate supervised training.
Endy0816 Posted March 30, 2014 Posted March 30, 2014 You are talking about optical character recognition(OCR). The better human verification systems are designed to prevent this however. People can be payed to basically do your job for you, with results and costs comparable to what the very best OCR software could manage. Figure out what 15 minutes costs at present, then see if this is more or less than what people are paying on mturk(elsewhere?) for similar tasks. Based on that you can decide whether to proceed or not.
Popcorn Sutton Posted March 30, 2014 Author Posted March 30, 2014 I know how to bypass the captcha now but I really didn't want to use OCR to get around it because it's going to use an entire OS to perform that task.
AtomicMaster Posted March 31, 2014 Posted March 31, 2014 1) There was a defcon talk on defeating most capchas. 2) It's a get vs post issue, http://en.wikipedia.org/wiki/POST_(HTTP)
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now