Why Search Engine For Traffic


Search Engine Relevancy

Many people think search engines have a hidden agenda. This simply is not true. The goal of the search engine is to provide high-quality content to people searching the Internet. Search engines with the broadest distribution network sell the most advertising space. As I write this, Google is considered the search engine with the best relevancy. Their technologies power the bulk of web searches.

The Problem Listing a New Site T

he biggest problem new websites have is that search engines have no idea they exist. Even when a search engine finds a new document, it has a hard time determining its quality. Search engines rely on links to help determine the quality of a document. Some engines, such as Google, also trust websites more as they age. The following bits may contain a few advanced search topics. It is fine if you do not necessarily understand them right away; the average webmaster does not need to know search technology in depth. 

Some might be interested in it, so I have written a bit about it with those people in mind. I will cover some of the parts of the search engine in the next few pages while trying to keep it somewhat basic. It is not important that you fully understand all of it (in fact, I think it is better for most webmasters if they do not worry about things like Inverse Document Frequency, as I ranked well for competitive SEOrelated terms without knowing anything about the technical bits of search); however, I would not feel right leaving the information out.

Gerard Salton

 The phrase vector space model, which search algorithms still heavily rely upon today, goes back to the 1970s. Gerard Salton was a well-known expert in the field of information retrieval who pioneered many of today’s modern methods. If you are interested in learning more about early information retrieval systems, you may want to read A Theory of Indexing, which is a short book by Salton that describes many of the common terms and concepts in the information retrieval field. Mike Grehan’s book, Search Engine Marketing: The Essential Best Practices Guide, also discusses some of the technical bits to information retrieval in more detail than this book. My book was created to be a current how-to guide, while his is geared more toward giving information about how information retrieval works.


Parts of a Search Engine

While there are different ways to organize web content, every crawling search engine has the same basic parts:

• a crawler

• an index (or catalog)

• a search interface

Crawler (or Spider)

The crawler does just what its name implies. It scours the web following links, updating pages, and adding new pages when it comes across them. Each search engine has periods of deep crawling and periods of shallow crawling. There is also a scheduler mechanism to prevent a spider from overloading servers and to tell the spider what documents to crawl next and how frequently to crawl them. Rapidly changing or highly important documents are more likely to get crawled frequently. The frequency of crawl should typically have little effect on search relevancy; it simply helps the search engines keep fresh content in their index.

 The home page of CNN.com might get crawled once every ten minutes. A popular, rapidly growing forum might get crawled a few dozen times each day. A static site with little link popularity and rarely changing content might only get crawled once or twice a month. The best benefit of having a frequently crawled page is that you can get your new sites, pages, or projects crawled quickly by linking to them from a powerful or frequently changing page.  

The Index

The index is where the spider-collected data are stored. When you perform a search on a major search engine, you are not searching the web, but the cache of the web provided by that search engine’s index.

Search Interface

The search algorithm and search interface are used to find the most relevant document in the index based on the search query. First the search engine tries to determine user intent by looking at the words the searcher typed in. These terms can be stripped down to their root level (e.g., dropping ing and other suffixes) and checked against a lexical database to see what concepts they represent. Terms that are a near match will help you rank for other similarly related terms. For example, using the word swims could help you rank well for swim or swimming. Search engines can try to match keyword vectors with each of the specific terms in a query. If the search terms occur near each other frequently, the search engine may understand the phrase as a single unit and return documents related to that phrase. WordNet is the most popular lexical database. At the end of this chapter there is a link to a Porter Stemmer tool if you need help conceptualizing how stemming works.




0 comments