Jump to content

Recommended Posts

Posted

I've spotted a very interesting new concept for a search engine.

 

Simply put, it uses distributed computing rather than one centralized cluster of servers doing the crawling. This means that the system is getting ridiculous amounts of new URLs every day (because there are numerous crawlers at once, rather than a few big ones) and lots of new data. I think it's a rather nice idea.

 

Unfortunately, their alpha search engine component (the bit for actually searching the stuff the crawlers have gotten) is a bit lacking. A lot of various searches turn up Microsoft as the first result - no idea why - and other irrelevant things. They have multiple algorithms, however, so I think progress is being made there.

 

Thoughts? I think it could be much better at crawling as much internet content as possible.

 

link: http://www.majestic12.co.uk/

Posted

Wow. Just to test, I searched "star wars" on both this and Google. This new site got me 1,345,560 results, google gets me "about 144,000."

 

Granted, I would never search through all the results, but that shows that there's likely a better chance of finding specific things you may be looking for under a subject.

Posted

Google gets me 149,000,000 for "star wars", and this new one gets me just over a million.

 

Not actually comparable. But there are many more being crawled daily. I've done about 30,000 URLs so far today.

Posted

I'm sure that's a record, CR.

 

Any reason why you two would get differing results for google?

 

There is more than one method of searching in that new alpha.

Posted

No, the individual with the most URLs for today has 4,869,963.

 

There are multiple ways because you can fiddle with your own algorithm and see if you can make it better than the default one. The default is rather pathetic for relevancy, although that's what the owner plans on improving now that the crawler works well.

Posted

isn't that the future of computing? distrubted systems - what with that bbc climate program (which though rubbish at process management) seems to be hooking onto the idea that the world itself makes up a bigger computer than the worlds current most powerfull machine - btw what its the world's most powerful computer these days?

 

A few years ago i was told it was the mainframe being installed into the met office in the new Exeter head office, though obviously that won't be true anymore!

Posted

Cap'n and I got different results because when I put "star wars" I included the quotes, I believe. I get 174 million or so on google when I don't.

 

Oddly enough, retesting that, Majestic-12 gives me 679,774 results for "star wars" this time...

Posted

The problem with delegating the construction of a search engine index is verifying the authenticity of the data returned. I think such a system would become immensely vulnerable to spam... think of how many spammers already control botnets with tens or hundreds of thousands of infected machines. How could you possibly protect a distributed search engine index from spam attacks from these systems?

 

I predict a search in such a system would yield results for porn and online gambling sites for virtually every search term.

Posted
The problem with delegating the construction of a search engine index is verifying the authenticity of the data returned. I think such a system would become immensely vulnerable to spam... think of how many spammers already control botnets with tens or hundreds of thousands of infected machines. How could you possibly protect a distributed search engine index from spam attacks from these systems?

 

I predict a search in such a system would yield results for porn and online gambling sites for virtually every search term.

The actual indexing is done on the server. All the client does is gather up URLs and their content. Only the server can decide what the content is, and what searches it will show up in.
Posted

Its an interesting idea and with enough participation would easily outmatch the Google search engine with their current search algorithms... a very interesting idea :)

 

Cheers,

 

Ryan Jones

Posted

I honestly believe this is placing the wrong emphasis on where search engines need to go. The number of sites trawled isn't that much of an issue whether you have lots of standalone machines dotted about or a smaller number of central clusters. The real issue is how the data is transformed to information that is actually useable and google is still by far the best on that front, albeit far from perfect.

 

But regardless, searching needs to become a great deal more personalised and I have some designs in mind on how to achieve this over the next 10 years. Not a replacement for google and the like, just an additional method.

Posted
I honestly believe this is placing the wrong emphasis on where search engines need to go. The number of sites trawled isn't that much of an issue whether you have lots of standalone machines dotted about or a smaller number of central clusters. The real issue is how the data is transformed to information that is actually useable and google is still by far the best on that front, albeit far from perfect.

If a search engine could get a huge index and relevant results, it would be a world-beater.

 

But regardless, searching needs to become a great deal more personalised and I have some designs in mind on how to achieve this over the next 10 years. Not a replacement for google and the like, just an additional method.

The problem with personalization is that sometimes people break out of the personalized "mold." If, for example, they always search for pet information, the engine might personalize to bring up more relevant results, but then their evil sibling gets on and tries to find instructions for nuclear weapons and only gets guinea pig feeding directions.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.