(Updated on 3/2/2011)
One of the biggest reasons why many organizations are not doing even simple things to help people find their websites in search engines is they simply don’t understand how search engines work. This article is intended to explain the basics of how search engines work and, in the course of doing so, shatter some search engine myths and help you understand what you can do to help people find your website in search engines.
What is a search engine?
The first to step to understanding how search engines work is to understand what a search engine is. The simplest explanation is that search engine is a tool for finding things online. There are many different types of search engines but they can be put into two major categories…
The first “search engines” were not actually search engines as we think of them today, but searchable directories of websites organized by hierarchical categories. A site is added to a directory when the website owner fills out a submission form on the directory’s website requesting their site be included in the directory and include their website’s title, description, URL (web address), and category. A moderator later reviews the site and, if it meets the directory’s criteria a listing for that site, is added or activated.
The original Yahoo was a human-powered directory. Yahoo is now primarily a crawler/spider-powered search engine, but they do maintain a directory as well and it is one of the few reputable directories on the web. The Open Directory Project is another example of a human-powered directory that continues on today, though many categories in the directory are poorly maintained and rarely updated. There are a lot of directories out there (some more human-powered than others), but only a handful that get any significant amount of traffic.
The downside of the human-powered search engine is that it only includes websites that have been submitted to it, which means you may not find what you’re looking for especially if it’s a new web page. The other downside from the directories’ point of view is that reviewing every site submitted is very labor-intensive and costly. I know because OurChurch.Com’s Directory of Christian Websites (like almost all church/Christian “search engines”) is a human-powered directory.
Crawler/Spider-Powered Search Engines
The next generation of search engines have programs which actively seek out new sites and read them into their indexes. These programs are called crawlers, spiders, robots, or bots. All of largest and most popular search engines today are of this type, including Google, Yahoo, and Bing. Note that Yahoo no longer uses it’s own crawlers. It gets its results from Bing.
The rest of this article is focused on how these crawler/spider-powered search engines work because more than 99% of searches are done on this type of search engine.
As mentioned above, the first part of a search engine is the crawler (aka spider, robot, bot). The crawlers find web pages, send the content back to the “indexer”, parse (separate out) the links on the page, and add the URLs (web addresses) of those links to a queue so the crawler can visit those pages later. The crawler then moves on to the next page on its list. The “indexer” parses the words of the content of the page and saves the content. All this happens very rapidly (the Googlebot can crawl thousands of pages at once).
Some important things to know about crawlers…
1) Search bots periodically reread every web page in their database. Why does this matter?
- You don’t have to do anything when you change your site. If your site is already listed in a search engine and you make changes to your website, search engines will eventually update their information.
- You do have to be patient. The frequency of re-crawling varies depending on the search engine, the importance of your site (as determined by the search engines), and how often you update it. It could take a day or it could take more than a week. You can use meta tags or search engine settings (like Google Webmaster Tools) to tell the search engines how often you update your site, but they only treat those as suggestions. If they do find, however, that you are updating your site regularly, that will cause the search engines to revisit your page more often.
2) Search bots follow links on the pages that have already been crawled in order to find new pages. Why does this matter?
- If you add a new page to your website or create a new website, it’s important to add a link to it on a web page that is already in the search engines.
- If a web page that is already in the search engines has a link to your new page or new website, there is no need to submit a request to the search engines to crawl the new page or website.
3) Some search engines have forms you can submit to request a website be crawled. Why does this matter?
- If you have a new website and no sites link to it, search bots will not be able to find it. In this case, submitting a form to the search engine requesting your site be listed or indexed can get it into search engines. You can also get indexed by getting other sites in the search engines to link to your website.
- Because human-powered directories do not have search bots/crawlers, to be listed in them you must submit a request form.
The Ranking Algorithm
Some time after a web page has been crawled by the search bot or crawler, the search engine then processes or indexes the page to determine what search words and phrases the page is relevant to as well as how relevant and authoritative that page is compared with other web pages for those phrases. During this processing the search engine looks at many different factors including how many times each word and phrase occurs on the page, which words are in headings or bold, the domain name of the site, filename of the page, the pages that link to the page, and many more.
Exactly which factors a search engine looks at and how they’re weighted is called the search engine’s search ranking algorithm. It’s like the search engine’s “secret sauce.” Each search engine’s algorithm is different and each is a heavily guarded secret.
Why does this matter?
- There is time between when your site is crawled (or recrawled) and when it is processed or indexed. So, it can take as little as a few hours to many days before changes to your website may produce changes in its search rankings.
- Because each search engine’s ranking algorithm is different, a web page can be #1 in Google but #20 in Bing for a particular phrase.
- Because each search algorithm is a heavily guarded secret nobody outside of a few select engineers at each search engine knows exactly how much each particular factor weighs into the rankings of each search engine. But, people who spend their professional lives helping sites rank better in search engines have gained very good idea as to what factors matter most.
- Because the search ranking algorithms look at text, headings, and other elements on a web page, changing things on the web page can change where that web page appears in the search results.
- Search ranking algorithms look at factors outside of a web page, such as the age of a website and links to the web page. So, there are other factors which you may have less influence over.
The goal of every search engine is to display to the user the information or websites the user is looking for. In other words every search engine wants to provide the best, most relevant results. As a result, search engines are constantly improving their algorithms and including new factors which they think will produce better search results. Why does this matter? As search algorithms change, so will your website’s search engine rankings.
The Search Engine Results Pages (SERPS)
The crawling of websites and indexing of web pages are constantly going on even when no one is searching. The last part in the search process is part that you’re probably most familiar with – the actual search. You type in a word, short phrase, or question, and the search engine displays a list of results.
That list of websites is called the search engine results page and sometimes referred to as the SERPs.
If you look at the screenshot of a sample SERP (click to see a larger version), you’ll notice several sections. At the top left and in the left column below the Google logo, you see search options. These are not results, just ways to refine your search.
Above the right column and next to the box with the first three results in the middle column it says “Ads.” These are paid advertisements. The companies and organizations listed here pay a fee to Google for each person who clicks their ad, so they’re often referred to as Pay-Per-Click or PPC ads.
In the center column below the Ads are the unpaid search results also sometimes called the organic results or natural results. These are the websites the search engine believes are most relevant to the search phrase that was queried. Sometimes search engines even display ad/sponsored links in the center column in a block of pay-per-click ads in the middle of the natural results. Sponsored links are always labeled, but not always very clearly.
Why does this matter?
From the searcher’s perspective, there is nothing wrong with clicking on a paid link. You may find what you’re looking for there. But it’s important to be aware of which websites paid to be in the results and which are there naturally.
From the web administrator’s perspective, it’s important to understand there are two opportunities to get to the top of the search engine rankings, through natural results and by purchasing pay-per-click advertising.
The major search engines all have multiple types of search options and search results. These include images, video, local, social, news, and more. When you search in a major search engine and do not specify a particular type of search the default search results are what’s known as universal search results, which can include results from any or all of the various types of search options. So, when you search, you may see images, videos, news, local listings, and several other types of results in the universal search results.
This is beneficial for both the searcher and the webmaster. For the searcher, it gives them more options which are often more relevant to their search. For webmasters, it provides multiple ways to get listed in the natural/organic search results. Your website could end up being listed multiple times on the first page of results.
Help the Search Engines
With a better understanding how search engines work, you can make better decisions about the marketing of your website though search engines. Seach engines need your help to find your your website and to know what words and phrases it’s relevant for. Give them the help they need.