Jump to content

Recommended Posts

Posted

One question is whether I can use a bot program (data mining) to bypass the whole entering of letters and numbers process that some websites use to make sure that the user is human. I know it's made to prevent that type of activity, but my company really needs to access information that, at this point, only a human can access. We need to use our time efficiently, and at this point our employees are taking anywhere between 10 and 15 minutes just to check two webpages. We're trying to access 5-10 webpages within 30 seconds and some of them require that the user identify the letters and numbers in the image.

 

The other question is this. When you get to a webpage and you want to perform a search, if you're using google it's easy to encode the query into something that you can just tack on to the end of the URL in order to make the search, (https://www.google.com/#q=like+this) but in webpages that deal with criminals and the whole legal system, the url doesn't change. It stays the same. (www.xxxxx.com/sheriff/12931.htm). I need to figure out how to get around these things. Any help will be appreciated :)


(I'm using Python 2.7 for this task)

Posted

In order.

 

#1 Yes, you can use bots to defeat captchas, as long as they're not too advanced. I've never done it before, but some googling will probably turn up a solution.

 

#2 What you're talking about is called deep linking. Unfortunately, many database driven sites use asynchronous ajax calls to dynamically update the HTML on the fly, so those other URL's don't actually exist - it mostly depends on how the site transfers data inputs for the queries. Using the URL has disadvantages, mostly in the maximum length of the query string that can be used, so it's not that useful for sites that offer lots of search features or that can generate long parameter strings. Unless the site artificially supports deep linking to the database results, you're out of luck.

 

However, there are programs you can use to automate filling in the forms when you need to run a search repeatedly. These applications are mostly used to automate qa testing, so you might take a look at what's available in that realm.

 

Also, I am not sure why you got two down votes on your post, so I gave you a plus. Enjoy.

Posted (edited)

Thanks for that, these people here think that it's funny how I have a negative reputation so they make a point of making it worse when they can lol, I used to worry about it but it's just become a joke by this point.

 

Thank you for your response, I'm hoping to get a little more in depth about this but I don't have the time ATM so I'll have to save it for later.

I don't like the idea of using the entire operating system to pull it off, I like doing the back end work because you can fly through text like it's nothing. It's extremely efficient and can probably save a lot of time.

Edited by Popcorn Sutton
Posted

You got negative (not from me), because you want to abuse websites that try to fight with guys like you. And you are asking us for helping you abusing their systems on our forum...

 


Whatever you will do, it will be just temporary. They can change font that's rendering texts and distortions within couple minutes.

Posted (edited)

Depending on your time requirements you can also, cheaply, hire out other people to do it via services like Amazon's Mechanical Turk(mturk).

Edited by Endy0816
Posted

I know about that but I like doing it myself because I know how to do these things

 

I mean hire them to enter the captchas and scrape/input whatever information you need.

Posted (edited)
I know about that but I like doing it myself because I know how to do these things

 

 

Rather because you are greedy, and want software to do work for you for free, instead of letting people earn some money.

Edited by Sensei
Posted

You are talking about optical character recognition(OCR). The better human verification systems are designed to prevent this however.

 

People can be payed to basically do your job for you, with results and costs comparable to what the very best OCR software could manage.

 

Figure out what 15 minutes costs at present, then see if this is more or less than what people are paying on mturk(elsewhere?) for similar tasks. Based on that you can decide whether to proceed or not.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.