Introduction to Canonical Tags in SEO
Are you here to learn about canonical tags in SEO? Let’s get started.
Canonical tags are essential to technical SEO that can hamper or boost your site’s performance and internet presence. In order to keep your site’s SEO healthy, you must understand what canonical tags are and how you can use them.
Wondering why? Why use canonical tags?
This is because poorly placed canonical tags or avoiding them altogether can cause issues that can hinder your site’s performance as well as ranking. That’s why we have created this write-up to discuss canonical tags in SEO. You’ll be able to answer what they are, how they function, and how you can use them on your site appropriately.
The canonical tag was a collaborative creation of major firms like Microsoft, Google, and Yahoo in 2009 to help website owners to fix the duplicate content issue on websites that can be accessed via multiple URLs.
These tags indicate to search engines the preferred version of the duplicate or similar content on your site to resolve the duplicate content issues quickly. Let’s dive deeper into canonical tags in SEO.
What are canonical tags in SEO?
In SEO, a canonical tag is an HTML code that informs search engines which URL, page, or content on your site is the canonical version (preferred or most important) among the duplicate or similar content on your site. It is also referred to as “rel canonical.”
Let me put it this way- If your site contains different versions of the same topic or duplicates or near duplicate content under different URLs, you can use canonical tags to tell search engines the most important content of your site among them- the one that you want to get crawled and indexed.
Using canonical tags, you can avoid issues arising due to duplicate or identical content on a site. To put it simply, it tells search engines which page you want it to rank on SERPs.
The placement of the Canonical tags.
A canonical tag is placed in the header section of the page, just like this-
<link rel= “canonical” href= “https://example.com”>
The first part- link rel= “canonical” suggests that the link in this tag is the canonical version of this page. And the second part- href= “https://example.com”- suggests that this URL is canonical.
What is a canonical URL in SEO?
A canonical URL is the page URL that is set as the canonical version among duplicate versions using the HTML code to prevent content duplication issues. You can choose the canonical URL yourself, or else Google itself selects the one it thinks to be the most valuable.
How is content assessed as duplicate?
To be considered as duplicate content, the pages must not be exactly identical. They can be considered duplicates even if they have minimum variations, such as variations caused due to sorting or filtering the products by price, color, type, and more.
This doesn’t imply that you should avoid creating similar content. There may be many reasons for having similar content on a site or covering the same topic. Maybe both are created to fulfill different intents, or perhaps the latter is more advanced than the former one, or more. Whatever the reason, maybe you can keep them on your site without hurting your SEO by using canonical tags.
How did Google choose canonical content?
Google tries to analyze the content of the page while indexing the site. If in case it discovers duplicate content on it, it chooses the one as the canonical version that it finds to be the most suitable and most valuable content among the duplicate ones. To decide, it considers various signals like Page quality, page secure version (HTTP or HTTPS), its listing in the sitemap, and the tags such as rel=canonical. You can also tell search engines your canonical version using the tags or other ways, and it will consider it as the main URL unless the other content is more relevant to the user’s query.
For instance, even if the desktop version is set as the canonical URL, the mobile version is served to the user surfing from the mobile device.
The importance of using canonical tags in SEO
There are many reasons you must use canonical tags in SEO. The most important one is to avoid duplicate content issues that may hurt the site’s performance, and all the other issues revolve around this. Google has difficulties assessing duplicate content. It may be challenging for it to choose which one should be indexed, which version should be ranked for a particular query, and whether it should consolidate the “link juice” to one page or split it among all.
Also, duplicate content on your site may affect your site’s “crawl budget” as it may crawl your duplicate versions, again and again, wasting the budget that can be used for crawling the other important pages of your site. The canonical versions are crawled more frequently as compared to the duplicates, which are crawled less. This helps you maintain the crawling load on your site.
Moreover, if you do not use a canonical tag, Google may choose the canonical version itself, which may be the URL that you don’t want to get ranked.
You can use Google’s URL Inspection tool to see which version of your content Google has chosen as the canonical version. Even if you have specified the canonical version, it is possible that Google may choose a different canonical version. There could be multiple grounds for this, including performance, content, and so on. Moving on.
Using canonical tags, you can:
- Specify the most important URL or page that you want to see in the results.
- Consolidate the link signal to the single preferred URL.
- Track the metrics for a particular topic or product more conveniently.
- Manage your syndicated content to make sure your preferred URL is shown in the results. And lastly
- Avoid spending resources on crawling duplicate content.
Does everyone have a duplicate content issue?
It’s genuine to assume that your site may not contain duplicate content because nobody publishes the same content again and again. However, the concern is search engines do not crawl web pages but URLs. And for them, each URL is a unique URL, even if it leads to the same page.
For example- The following are the URLs that can lead to the homepage of the website “xyz.com.”
- http://www.xyz.com
- https://www.xyz.com
- http://xyz.com
- http://xyz.com/index.php
- http://xyz.com/index.php?r
For us (humans), all these may appear the same as they lead us to the same home page, but for search engines, each URL is unique.
Another example could be the following two URLs for the same web page.
www.xyz.com/product and www.xyz.com/product?color=red
These URLs are referred to as parameterized URLs which is the major cause of duplicate content on a site, majorly on eCommerce websites. However, not only these sites are affected by this. Each kind of website face duplicate content issue for many reasons.
Following are some examples leading to duplicate content issues on a website:
- Parameterized URLs for session IDs.
For example-https://xyz.com?sessionid=4
- Pages optimized for multiple devices.
Example- https://xyz.com and https://m.xyz.com
- Parameterized URLs for search parameters.
For example- https://xyz.com?q=search-term
- Different URLs for posts under different categories.
Example- https://xyz.com/services/SEO/ and https://xyz.com/specials/SEO/
- www and non-www versions of your content.
For example-http://www.xyz.com and http://xyz.com
- AMP and non-AMP page versions.
Example- https://xyz.com/page and https://amp.xyz.com /page
- Similar content with and without trailing slashes.
For example- https://xyz.com/page/ and http://www.xyz.com/page
- Same content at HTTP and HTTPS.
Example- http://www.xyz.com and https://www.xyz.com
- Similar content with and without capital letters.
For example- https://xyz.com/page/ and http://www.xyz.com/Page/
These are only a few reasons for duplicate content. Along with these, there may be more conditions you may need to use the canonical tags on your site.
Besides this, there’s a cross-domain content issue. If you practice content syndication, you must use a self-referential canonical tag on your content so that your syndicated content specifies you as the canonical version. Syndicated content may still appear in search results even using a self-referential canonical tag, but it lowers the chances of it outranking the original content.
Most sites suffer from duplicate content issues without even realizing it. Therefore, it’s important to examine this issue on your site and use canonical URL tags to help search engines identify the canonical version of your similar content.
Golden Rules for using canonical tags in SEO
Following are the few rules you must always keep in mind prior to using the canonical tags.
1: Practice using absolute URLs
One of the Google representatives, Mr. John, on using canonical tags, stated that it’s best to use absolute URLs with canonical tags instead of relative URLs.
This means your URL structure should be like this-
<link rel=“canonical” href=“https://xyz.com/sample-page/” />
Instead of this one-
<link rel=“canonical” href=”/sample-page/” />
2: Prefer using lowercase URLs
As we have discussed above, URLs are case-sensitive. Google usually considers the uppercase URLs and lowercase URLs as two unique URLs, which may lead to duplicate content. Thus, you must practice using lowercase URLs on your site as well as when using canonical URLs. Lowercase versions are usually the preferred ones.
3: Make sure you’re using the correct domain version (HTTP/HTTPS)
Using the correct domain version is essential to ensure everything runs smoothly. This is because an incorrect version may create confusion and may lead to unforeseen results. If you’ve switched from HTTP to HTTPS, make sure you use the secure version all over your site, and if not, stick to the HTTP version. While using the canonical tags, keep checking on your HTTP version. For example, if you have got an SSL certificate for your domain, your URL must look like this-
<link rel=“canonical” href=“https://xyz.com/sample-page/” />
Instead of this-
<link rel=“canonical” href=“http://xyz.com/sample-page/” />
If you haven’t switched yet, use the HTTP version.
4: Use self-referential canonical tags
Google’s trend analyst, John Mueller, recommends using self-referential canonical tags on the content you want to get indexed and appear in SERPs. If you’ve duplicate content on your site or a single content can be accessible with various URL versions using Self-referential canonical tags can help Google understand your canonical versions.
If you are trying to understand what Self-referential canonical tags are and how they work, we can help. A canonical tag that points to itself on a page is referred to as a self-referential canonical tag.
For instance, The URL of a page is – https://xyz.com/sample-page,
Then its self-referencing canonical version will be-
<link rel=“canonical” href=“https://xyz.com/sample-page” />
Many CMS lets you include the self-referencing URLs automatically. However, if you’re using a custom CMS, you’ll need to ask your developer for this.
5: Follow the One Page One canonical tag rule.
Using multiple canonical tags within a single page can be a bad signal. Google may get confused and may ignore all the tags instantly.
You must follow one page and one canonical tag rule in order to make your tag count, or else Google will ignore all the tags and decide on the canonical version on its own.
Ways to implement canonical tags in SEO
Canonical tags can be implemented in five different ways, which are also referred to as canonicalization signals: These are-
- Using HTML tag (rel=canonical)
- Within HTTP header
- Within Sitemap
- Using 301 redirects, and lastly
- The Internal links
Let’s explore them one by one-
1. Placing canonicals using HTML tags
This is the simplest way to set a canonical URL using a rel=canonical tag (HTML tag).
This tag is usually placed within the <head> section of the duplicate or near duplicate pages.
<link rel=“canonical” href=“https://xyz.com/canonical-page/” />
For Example- Your home page can be accessible with many URLs, but you want to canonicalize the HTTPS version. So, you can add a canonical tag to all your other URLs to make your HTTPS version canonical. The code may look like this-
<link rel=“canonical” href=“https://xyz.com” />
Pros- You can map numerous duplicate contents using HTML tags.
Cons- It can expand the page size, it may be difficult to handle for larger sites, and it only works well for HTML pages, not for PDFs or similar files.
Note: Be careful while adding it to your page’s code. Placing it wrong might create a mess. However, if you are using any CMS, you can place it effortlessly. Let’s see how-
- Placing canonical tags in WordPress
In WordPress, you can easily place the canonical tags. You’ve to install the Yoast SEO plugin, and it will automatically place the self-referencing canonical tags. Easy? Yes, surely it is. You can also set custom canonical tags in WordPress. You can visit the “Advanced” section on each page or post and add the canonical URL you want it to point to. It also supports cross-domain canonicals.
- Placing canonical tags in Shopify
In Shopify, the self-referencing canonical URLs for blog posts and products are added by default. If you want to add custom canonical URLs, you’ll have to edit the “template.liquid” files.
2. Placing canonicals in HTTP headers
The only way to place canonical tags for documents like PDFs, images, and word docs is to use the HTTP headers. This is because there’s no <head> section on the URLs to add HTML tags. Also, on standard web pages, you can utilize the canonical tags in HTTP headers.
In order to add the canonical tags to HTTP headers, first, open the .htaccess file. In the header, add the canonical tag and submit it to the Google search console.
Pros- It does not increase page size and can be used for numerous duplicate pages.
Cons- It could be challenging to get it up on a larger site or a site where URLs frequently change.
3. Adding canonicals in sitemaps
A sitemap is a great way to tell search engines about the important pages of your site. Google visits the sitemap before crawling your site to make sure each listed page gets crawled and indexed. According to Google, you must include your canonical URLs and avoid adding duplicate and non-canonical URLs to your sitemaps. This will help Google index your canonical URLs easily. Although adding canonicals to sitemaps is not a guarantee to be considered canonicals by Google, it’s the simplest way to mark your preference.
Pros-It is the simplest way to set canonicals even on large sites.
Cons- It isn’t a guarantee. Google may still consider the duplicate URLs even if you specify the canonicals,
4. Using 301 redirects to set Canonicals
Another way to set canonicals is to use 301 redirects. By placing 301 redirects, you can divert traffic from your duplicate URL to the canonical URL.
Let’s say your home page is accessible with three different URLs, which are-
- xyz.com
- xyz.com/index.php
- xyz.com/home/
You want to make “xyz.com” your canonical URL, so you can redirect the other URLs “xyz.com/index.php” and “xyz.com/home/” to it.
Also, if your site contains HTTPS and HTTP versions of the same content or the www and non-www version of the same content, you can choose one as canonical and redirect the other to it.
Note: Google suggests using 301 redirects only when you are deprecating a duplicate page or URL.
5. Internal Links
Your site’s internal linking structure works as a canonicalization signal. The way you provide internal links throughout your website signals to Google which pages on your website are most important. It helps search engines to understand your preferred canonical pages.
John Mueller, in this video, discussed canonical URLs and explained how Google chooses canonical URLs.
You must place internal links within your website wisely so that Google can determine your canonical URL easily.
Common mistakes with canonicalization
Setting up and managing canonicals might get tricky sometimes. It is a challenging subject with many misconceptions and misdoings. Here, we have discussed a few common mistakes that individuals make while canonicalizing URLs so that you can avoid them in the future.
Canonicalized URL is blocked via robots.txt
Sometimes a canonicalized URL is blocked using robots.txt unintentionally, which stops bots from crawling it, and they can’t discover the canonical tags on it.
Canonicalized URLs are set to ‘noindex.’
Noindex is not a canonical signal. If you want to canonicalize a URL, noindexing the other will not signal your canonical preferences. It just noindex the particular pages. Also, noindex and rel=canonical must not be used together. They are of conflicting nature.
However, canonical tags are preferred by Google over the noindex tag if you’ve used them together.
You can use a 301 redirect when noindexing and canonicalizing a URL, or else you can use rel=canonical.
Canonicalized URLs are Set to a 4XX HTTP code.
The use of a 4XX HTTP code has the same effect as the noindex tag.
When you set a 4XX HTTP status code for a canonicalized URL, Google won’t be able to discover the canonical tag in front of a 4XX code and cannot pass the “link juice” to the canonical version.
Canonicalizing all paginated pages
If your content spans several pages, canonicalizing each paginated page to the first one is an improper use of the rel=canonicals. This is because these are not duplicates. These are the series of pages of a single subject. For such paginated pages, you can use self-referencing canonicals instead of using rel=canonicals.
For example- The following are the links to the content that is stretched over different pages.
xyz.com/article?story=nutrition-news&page=1
xyz.com/article?story=nutrition-news&page=2
xyz.com/article?story=nutrition-news&page=3
In this case, using rel=canonical may confuse Google and might affect indexing. Instead, you can use rel= “prev” or rel= “next.”
Using hreflang without canonical tags
When using a Hreflang tag, you must use a canonical tag with it to specify the canonical version in the same language. The Hreflang tag does not specify your canonical URLs and only specifies the language of the page to target a particular geographical location.
As per Google, you must use a canonical tag with the hreflang tag to specify the canonical URL in the same language. If no canonical page is present in the same language, you can canonicalize the URL in the other language.
Using several rel=canonical tags
Using multiple canonical tags to specify the canonical version is not a good practice. Google might confuse and ignore all the tags completely if encountered with multiple canonical tags.
This might happen when tags are used in themes, plugins, and CMS individually. For this reason, many plugins include an overwrite feature to ensure that they are the exclusive source of canonical tags and avoid this situation.
A similar situation might arise if canonicals are added with HTML as well as JavaScript. You must avoid adding canonicals in JavaScript. Google might consider canonicals in java scripts when you haven’t specific canonicals in the HTML. However, if you have already specified canonicals in HTML and also canonicalized the preferred version in Java script too, this will send mixed signals to Google, and both might be ignored completely.
Using Rel=canonical tag in the <body>
Using rel=canonical in the “body” of the document is a bad practice. These are to be placed only in the <head> section of a document. This is why Google will completely ignore the canonical tag in the <body> of a document.
In addition to this, you must use the rel=canonicals at the earliest to avoid HTML parsing issues.
How to fix canonicalization issues?
Canonicalization is a tricky subject. Chances are anyone can easily make mistakes while canonicalizing the URLs. This is why you must audit your site on a regular basis to keep track of canonicalization issues on your site so that you can fix them in a timely manner.
To audit your site, you can use the Site Audit tool from Ahrefs. This will help you discover the canonical issues along with other SEO-related issues on your site.
Now, let’s discover what canonical issues may occur on your site and how you can fix them.
1. Canonicals pointing to 4XX
Your site may be at fault when your pages are pointing to a URL that is 4XX.
The 4XX pages are the pages that do not exist or are restricted, which is why search engines don’t index them. If your pages are canonicalized to 4XX pages, search engines will bypass the canonical tags and may index the non-canonical version- the one you don’t want to get canonicalized.
How to fix it?
Using the Site audit tool, you can discover the 4XX pages of your site and replace the 4XX canonical link with the existing pages on your site that you want to get indexed.
2. Canonicals pointing to 5XX
This error may arise when the site’s pages are canonicalized to the 5XX URL.
The 5XX codes signal the server error, and canonical pages became inaccessible. Usually, Google does not index inaccessible pages, and as a result, it bypasses the canonical tags.
How to fix it?
When encountering such issues, replace the invalid URL with the valid one. If your specified canonical URL is valid and correct, you must check the server misconfigurations.
Note: This might be a temporary error. Bots may have crawled your site when it was under maintenance or your servers were overloaded.
3. Canonicals pointing to redirects
Some of your site’s pages may be canonicalized to the URL that is redirected to the other URL. This might create confusion for search engines. This is because the canonicals must point to the authoritative page. When canonicals point to a redirecting URL, Goole may misinterpret it or ignore the canonicals completely.
How to fix it?
To fix this issue, you can replace the redirecting URL with the URL of the authoritative version of the page directly.
4.No canonicals for duplicate content
When your site contains duplicate or near duplicate content without specifying canonicals, it might create a duplicate content issue, and Google might pick the canonical version itself that might be the one you don’t want to get indexed. Also, your important page may be abandoned by Google.
How to fix it?
To fix the duplicate content issue on your site, audit your site and look for duplicate content. Choose the canonical version from each group that you want to get indexed and specify it as canonical using the rel=canonical tag on all duplicate versions. Also, for the canonical version, use the self-referencing canonical tag. This will make sure Google index the specified canonical version of your content.
5. Hreflang tags to non-canonical version
Specifying the non-canonical versions of your content in hreflang annotations is a wrong practice. The links in hreflang tags must always point to the canonical version rather than the non-canonical versions. This is because pointing to a non-canonical version may mislead search engines and may create unexpected results.
How to fix it?
To fix this, you can replace the links of non-canonical versions with the canonical ones in the hreflang annotations.
6. Canonical URL without internal links
Your site might contain canonical URLs that do not contain any internal links. This makes the canonical URL inaccessible to users as well as search engines. This is because it became orphaned content, and search engines might crawl it less often. Users might be directed to the non-canonical version of your content rather than the canonical version.
How to fix it?
Determine your canonical pages and make sure to add internal links to them to make them accessible to users and search engines.
7. Adding non-canonical URLs in the sitemap
Adding non-canonical URLs in a sitemap is a bad practice. A sitemap is a path for a search engine to tell the important pages of your site. If you add non-canonical URLs, Google will index those pages that are less important for your site. According to Google, you must only include your canonical URLs to the sitemap that you want to get indexed.
How to fix it?
To fix this issue, you need to remove the non-canonicals from the sitemap and only add the canonical URLs to it.
8. Canonicalizing a non-canonical page
The error may arise when you specify a canonical URL to a non-canonicalized page that is canonicalized to any other URL. For instance, you canonicalized page A to page B, which in turn canonicalized to page C. This leads to a canonical chain that may confuse Google, and it may misinterpret it or ignore the canonical tag completely.
How to fix it?
To fix it, you can Remove the non-canonical link in between and place a canonical tag to direct the canonical page.
To resolve the above example, page B is the non-canonicalized page. You can replace it and use a canonical tag on page A to canonicalize page C directly.
9. The conflict between Open Graph URLs and canonical URL
The issue may arise when there’s a conflict between an open graph URL and the canonicalized URL. The open graph tag is the code that determines the appearance of URLs when posted on social media platforms.
When both the open graph URL and canonical URL do not match, then its non-canonical version will be posted on social media.
How to fix it?
To fix this issue, you’ve to make sure that both URLs are the same URLs. Replace the open graph tag from the non-canonical version with the canonical URL.
10. Canonicalizing the HTTP version from the HTTPS.
Canonicalizing the non-HTTPS version (HTTP) as canonicalized URL version from the HTTPS is a bad practice. According to Google, you must canonicalize the secure version (HTTPS) as canonicalized URL from the HTTP version if you have HTTPS as well as the HTTP version of your content.
How to fix it?
To fix this issue, you can redirect the HTTP version of the URL to the HTTPS version. Else, you can use a rel= “canonical” from the HTTP version to the HTTPS version.
11. Canonical from the HTTP version to HTTPS.
There’s no logic in having an HTTP version as well as an HTTPS version of a page. And it’s more illogical to canonicalize the HTTPS version from the HTTP version. Although it’s not a big concern, fixing it when possible can be a good practice.
How to fix it?
To fix this, you can use a 301 redirect from HTTP to the HTTPS version of your content. You must also replace the internal links on the HTTP version with the HTTPS version directly.
12. Non-canonical pages rank higher on SERPs.
Your site’s non-canonical pages get a higher ranking in search results and receive more traffic than your canonicalized version. This may happen when you have set your canonicals incorrectly or search engines have bypassed your canonical tags and chosen the canonicals on their own.
How to fix it?
To resolve this issue, you can check whether your rel=canonical tags are correctly placed or not. Else, you can use the Google Search Console’s URL Inspection tool to analyze which URLs Google has considered as canonicals. Examine wisely and fix the issue as soon as possible.
To Conclude-
Canonical tags may appear difficult and complex at first, but once understood well, they are not that challenging.
It’s worth noting that canonical tags are signals, not directives. This means that they can signal search engines about your preferences, but it is up to them how they respond to them. Based on its interpretation, search engines may select a different canonical version than the one you selected.
You can easily determine this using Google’s URL Inspection tool. This will help you determine what canonicals you declared and what Google has selected as canonicals.
You can also look at Google’s paper on Canonical tags here to clear up any confusion concerning canonical tags.
Hope this written piece has helped you comprehend canonical tags in SEO, why they are important, what mistakes you might make when canonicalizing URLs, and how to fix canonical issues on your site.