Fix duplicate content in the search engine index with the canonical element.

March 8th, 2010 by Susan

Let The Search Guru show you how to use the canonical element to fix duplicate content and search engine index issues.

Here at The Search Guru blog we’ve posted previous entries discussing canonical issues and how they can cause duplicate content in the search engine index. To refresh your memory, canonicalization is the process of the search engines having to choose the best URL to index when there are several choices. To a search engine, http://www.mysite.com and http://mysite.com look like two different sites that have exactly the same content, which can trigger the Google duplicate content filter.

One way to address this is by implementing a 301 redirect from the non-preferred to the preferred version of a URL. We show you how to do this on a Unix/Apache server and a Windows server.

Use the canonical element as well to fix Google duplicate content.

The canonical element was introduced by Google about a year ago, but you may not yet realize its power. The canonical element (also referred to as the canonical tag, although it is technically not a tag) shows Google which version of a URL to include in the search engine index.

Read the rest of this entry »

Search engine index: use 301 redirect to fix page counts in Google & Yahoo!

February 1st, 2010 by Susan

The number of site pages in a search engine index can differ widely between Google and Yahoo!, which we all know is frustrating. Why the difference?

Let’s say your site has 300 pages in the Google search engine index and 315 in the Yahoo! index. Those numbers are pretty close; it looks like both spiders are crawling and indexing most of your content.

But what if your site has 300 pages in Google, and 620 in Yahoo!? This indicates that there may be a deeper issue affecting your site.

Canonical URL, duplicate content and 404 error pages are three common reasons why page counts can differ.

Read the rest of this entry »

HTTP status code listings: how to optimize http headers for better SEO

August 15th, 2008 by Heather

Learn how status code listings affect your site.

Your website speaks a hidden, coded language. When other computers come to visit, it may give them things like a 200 code, 301 code, 302 code or 404 error message code. Each of these codes are meant to tell a visiting computer what is going on with your site.

So, how often does another computer visit your site? The answer is basically the same number as the number of visitors you get. Your site is really just a computer and your visitors are looking at a computer that talks to your computer and translates what is said. Status code listings are among the ways your site’s computer and your visitor’s computer talk to one another. Search engines are also computers and understanding how to optimize http headers and status code listings will improve your rankings.
Read the rest of this entry »

What is a canonical issue and how to fix a canonical issue.

June 5th, 2008 by Heather

Do you know what a canonical issue is?

If you are relatively savvy about SEO, you know that your site should have a robots.txt, a human site map and a sitemap.xml. These are considered to be basic elements that you add to a site for SEO purposes. But did you know that you also need a canonical redirect?

Fixing canonical issues is important. A canonical issue can wreak havoc on your site in the search results. It can cause the search engines to penalize your site and remove pages from your site from the results. And all it takes is a simple oversight on your part with your site.

Is your site susceptible to a canonical issue?

Read the rest of this entry »

Search engines spiders: spider names and what search engines they belong to.

February 7th, 2008 by Heather

Do you know the spider names?

If one ever wondered if search engine engineers were a whimsical lot, one only needs to looks at the names of search engines’ spiders to find the answer. Googlebot, Slurp, Ask Jeeves and ia_archiver are some of the spider names used by the most popular search engines on the web.

Which search engines uses spiders to fetch documents?

Using a program called a spider, search engines continually search the web looking for new pages and reviewing documents they already have in their index. You may ask, “Which search engine uses spiders to fetch documents?” and the answer is, “All of them.” Without a spider, search engines would not be able to find new pages.

Why name a spider? Search engines don’t have to.

The main purpose of spider names is so that search engines’ spiders can be easily identified in your site logs. When was the last time you looked at your logs? Today, with the availability of free tracking software, such as Google Analytics, many people never look at their logs, but if you do, you will see the spider names from the search engines appear all over it.

Think of it as a courtesy. Certainly the search engines’ spiders could arrive at your site under the name “Mozilla” which is how most web browsers appear in your logs, but the search engines don’t wish to be rude. By using spider names, they let a site owner know that they have been by to visit.

Why you should care about search engines’ spiders and their names.

You can use this information to find out when the last time the search engines’ spiders came to your site or a page on your site. The more important they think a page is, the more frequently they will visit a page. If you look at your site logs, you will see the spider names and will be able to tell if the search engines have picked up the latest changes on your site.

Name of each spider, search engines they belong to:

Googlebot  Google.com 
MSNbot Search.msn.com
Ask Jeeves/Teoma  Ask.com 
Architext spider  Excite.com 
Yahoo Slurp Yahoo Web Search
ia_archiver  Alexa.com

Knowing a little more about search engines spiders and the spider names will allow you to get a better insight into what the search engines know about your site. Tune in next time when we go over how you can give directions to spiders about how, when and if they should look at pages on your site.