Searching the web Part I

This week I took a cuil one week challenge. I was curious to find out how this search engine stacks up against Goolge’s search engine. My aim was to use no other search engine apart from cuil. However, within three days of the challenge, I had to switch. Search results were poor. A search on say ‘agile development’ resulted in no wikipedia hits. There were no spell checker and no add-ons for Firefox.

One reason for abrupt end to my cuil one week challenge was that I was introduced to another search engine called clusty. Clusty not only returned better results when compared to cuil,but instead of delivering millions of search results in one long list, clusty grouped similar results together into clusters. Clusters help you see your search results by topic so you can zero in on exactly what you’re looking for or discover unexpected relationships between items. What is great you can search within clusters.

Clusty used a clustering algorithm for its search engine. Everyone is familiar with Google’s PageRank  algorithm. Clustering involves the separation of , say, unrelated documents and group related documents together. Using the contents of a web pages and their link information, the content-link hypertext clustering algorithm groups similar web pages into more complete web pages that can be searched or combined into larger clusters. To generate clusters, the algorithm uses similarity functions based on the contents of the web pages and the hyperlink information. There are two similarity functions for this algorithm, a similarity function that examines the hyperlinks of the pages and a similarity function that examine the contents of the web pages. Combining the hyperlink and content similarity functions together in an iterative nature produces web pages that are similar, grouped in clusters.

Other web search algorithms of note are:

HITS -Hyper text induced topic selection

ARC – Automatic resource compilation

SALSA – Stochastic Approach for Link Structure Analysis

These I will discuss in my next blog post.



Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s