Optimising URL structure

O

A URL structure consists of:

The protocol [https://www.wordprexeo.com] specifies the manner in which a browser is expected to retrieve the information from a resource. The web standards are HTTP (Hypertext Transfer Protocol) and HTTPS (Hypertext Transfer Protocol Secure) which can be defined as specific structures for transferring and receiving information on the web, most frequently used to retrieve HTML web pages. The notable difference between the two is that HTTPS uses an SSL (Secure Sockets Layer) certificate that facilitates the encryption of information, ensuring it can only be accessed on either the sending or the receiving end and not in between, by someone other than the website owner or the user.

This enforces a level of protection over the user information from any unwanted third parties, including hackers or malicious applications designed to capture user data. It also prevents unauthorised third parties from using advertising on the website, often employed by free wifi networks.

As a result of its enhanced security, HTTPS websites get a lock icon in the browser window alerting users that their information is being protected, giving just a small additional level of credibility to the website. Lastly, it is required for the implementation of accelerated mobile pages (AMP) and is favoured in analytics, as it tracks referrer data with greater accuracy. For these reasons, it is currently believed that major search engines have a slight preference for HTTPS.

The preferred domain (canonical domain) specifies the version of the preferred domain [https://www.wordprexeo.com] that precedes the domain name. To be technically accurate the www version of a website actually means the website will be placed on the www subdomain. Although there is no set preference by Search Engines for either one of these options, only one single version must be used across all the URLs on the website for consistency purposes. In order for this to work, all internal links must point straight to the preferred version as well as all the website URLs must be written using the preferred option, this also extends to the URLs used in the website code and the XML sitemap.

It is generally recommended to have permanent [301] redirects set up to the preferred version from the other one, even when only one of them is used across the website. In cases when the preferred version is not clearly specified, search engines may treat the two versions of the same page as references to separate pages, resulting in duplication issues. Although Search Engines do not give preference to any one version over another, one argument in favour of the non-www version is that it makes for a shorter URL, allowing more URL space for what may, in fact, be important.

A subdomain [https://wwwseochecker.wordprexeo.com] is part of the original domain name and top-level domain, with a certain degree of autonomy, traditionally employed for such purposes as blogs, internationalisation, forums, career sections, and even product lines. However, with the advent of SEO, and particularly the fact that different subdomains are now treated as entirely different websites from each other or the top-level domain, their application has become increasingly more limited in favour of subdirectories (subfolders).

It’s only logical that the application of subdomains has now been reduced to sections that are clearly outside the scope of the website or when representing different business verticals that have little in common outside their brand name. Subdomains may be useful as a tool for enhanced user experience, delimiting certain sections of the website from its domain name. They may also be useful in particular for websites that cater to numerous fragmented audiences and intend to build authority in these niche markets with inbound links coming from sources relevant to one subdomain but not to the rest of the domain it is attached to.

One may argue that subdomains may also occasionally be used for targeting particular keywords, which may be impossible to do as part of the domain name, however, it must be noted that usually the cost of this is far outweighed by the impact. Lastly, subdomains can be especially great for testing a new design, UX and content elements on a website without the fear of it being out of touch with the rest of the website or for what it’s worth, having a negative influence over the organic search performance of the main domain.

A domain name [https://www.wordprexeo.com]is the snippet of text in between the preferred domain (or subdomain) and the top-level domain and is most commonly associated with the brand name. The domain name is the part of the URL that is most difficult to change, hence the importance of choosing a domain name that is meant to serve as long as possible. Although historically, the domain name could have a drastic influence on SEO through the use of so-called exact match domains (EMD) targeting of keywords, this was short-lived.

In fact, search engines have developed algorithms that would address the injustice caused by exact match domains (EMD) selected merely for their keywords, thus diminishing the power of websites to rank for keywords based on domain names. It’s worth noting, however, that while EMD algorithms have taken the power away from exact match domains, it has placed a strong emphasis on the branding factor to replace it. This has shifted the domain name game to the simple practice of selecting the right brand names, that make sense, are distinguishable in SERPs, and memorable to the users.

Although the choice of a domain name should be mainly driven by the branding factor, all things being equal, shorter domain names should be preferred to longer ones. As a future-proof practice, it’s also a good bet to avoid any kind of special characters, including hyphens [-]. Domain names are unlike any other segments of a URL in their capacity to encourage repeat visits from SERPs. It is not an overstatement to say users rely heavily on domain names in making judgements on the credibility of information and the repeat click-throughs to the website.

A Top-level domain (TLD) [https://www.wordprexeo.com], also known as a domain suffix refers to the segment that follows the domain name, separated by the dot [.] and is the last URL element ahead of the URL path. Top-level domains can be either generic such as [.com] for commercial businesses and [.org] for organisations or country-specific, such as [.co.uk] for the United Kingdom and [.de] for Germany. Some top-level domains can fall under both segments, such as [.ac.uk] for universities in the United Kingdom only.

There are currently more than a thousand registered top-level domains, with some being more popular than others. The choice of which top-level domain to use will depend on the specifics of the website, with nearly half the registered domains using the [.com] extension. It’s worth noting that as new top-level domains are released, it provides the opportunity to choose domain names that were previously occupied on existing top-level domains, particularly those incorporating important keywords. However, this should be approached with caution as less-popular top-level domains can be associated with spammy websites.

Additionally, when choosing a Domain Name on a less popular top-level domain for the reason of it being occupied on other top-level domains, one may run into difficulties with branded search. Ensuring one can access your website by simply typing your brand name in the search box is far more important than having a catchy domain. It is therefore generally recommended to use the more established top-level domains unless an alternative one provides some brand value that the ordinary ones do not.

In the case of websites that are tied geographically, it is wise to use an appropriate domain suffix to indicate that. Not only will this be used by users when deciding on a website and which generally prefer local domains, but also by search engines that understand the differences in top-level domains. Some international websites targeting multiple geographical areas and languages take this one step further by using one domain name spanning across multiple top-level domains, with websites being interlinked through alternate tags that identify them as belonging to the same entity. This is especially common for large e-commerce websites that target distinct geographical markets using different languages.

A URL path contains:

  • None, one or several levels of Subdirectories [/subdirectory/subdirectory-2/landing-page/]. A good example of subdirectories are the blog categories and tags, separated in the URL path by a forward slash [/].
  • A Landing Page [/subdirectory/landing-page/], or
  • A File, with respective extension [.pdf]. Although traditionally landing pages also used to have file extensions [.html; .php; ], these are now most often hidden to appear more readable to users. At present, file extensions are still cmmonly employed for images [.jpg or .png] and documents [.pdf].
  • URL parameters [/?id=123] for dynamic URLs, used to display dynamically-generated content employed on large e-commerce website, where such functions as advanced categorisation, filtering, and ordering may be necessary. It may be worth noting, however, that the practice of disguising dynamic URLs as static is on its way up, mainly employed to make the URLs more readable to the user.

The hierarchy of a URL structure

The purposes and sizes of websites vary greatly over the web, and therefore so do their URL structures. While some are better served by a flat hierarchy structure with only one or two levels of subdirectories, others may require more vertical hierarchies to ensure an accurate representation of their content structure.

Two aspects commonly considered in the context of URL structure are:

  • Information Architecture: Assumes the website information if layered, in a way that broad general topics containing niche topics.
  • Website Taxonomy: The reflection of the website architecature on URL structure in a way that broad general topics are part of the website’s main subdiorectories and the respecitive niche topics for each of the general topics is first part of that broad general subdirectory, before employing a subdirectory of its own for the niche topic [https://wordprexeo.com/onpage-optimisation/url-optimisation]

As a general rule, UX best practice would recommend a homepage to lead to the main pages of the website within 3 interactions, so the maximum appropriate number of levels for subdirectories should be limited to 3. Having more than 3 levels of subdirectories generally complicates the experience for the users and should only be employed when it, in fact, adds clear value to the UX. In order for Page URLs to be able to show the wider context of their landing pages one must make good use of subdirectory structure.

A subdirectory structure stands for the relationships between all website subdirectories, as visible from the URL paths. The closer a category is to its domain and the more associated subcategories and landing pages it has, the higher the authority it is assigned within a website by the search engines. An adjoining aspect of this is the proper use of taxonomy, designed to ensure related content pages are grouped around meaningful topics, as these structures can be of additional value to SEO.

Categorising landing page URLs

Like in the case of Page Titles and Meta Descriptions, URLs must be descriptive of the landing page they represent but also make sense in relation to each other. In other words, if an information website lists all its landing pages with little regard to taxonomy, it hardly enables users to effectively interact with the website content.

On the other hand, when a landing page is listed under one broader category or several subcategories directly linked to its topic, it gives the user a sense of where they are on the website, allowing them to go back and forth and interact more freely with the wider information context of the website. Not only is this beneficial to the user, but also to the search engines which place a significant emphasis on understanding these relationships.

Better yet, search engines have learned to establish relevancy signals for subdirectories based on the kind of landing pages they host and vice-versa, so designing appropriate content taxonomies can go a long way in uplifting the website in search results. To sum it up, a healthy URL taxonomy enables search engines to better understand not only the content of particular landing pages but entire sections and the website as a whole, uplifting its overall ability to rank in organic search results.

Using subdomains vs subdirectories

The word from the Search Engines is that they treat subdomains and subfolders equally, at least as far as Crawling and Indexing are concerned. However, taking a closer look at their own comments, this couldn’t be further from the truth. Although search engines have learned to crawl and index subdomains at the same speed and convenience as the subdirectories, several differences between the two define the difference in their application.

Subdomains, in particular, are treated by search engines as separate entities with their own relevancy, trust and authority signals from Search Engines, independent of their domain. Subdirectories, however, remain tied in these regards to their domain, which explains why they are the option of choice, most of the time. Unless the website’s sections or business verticals are so distinct from each other that they won’t benefit from the inbound links coming to the other sections, the recommended option is to use Subdirectories as opposed to Subdomains.

When trustworthy, authoritative links lead to a Subdirectory of a website, they also impact the authority of other subdirectories and the wider website. Thus, unlike in the case of subdomains, having the content spread over subdirectories allows for the authority, trustworthiness and relevancy signals used by search engines to be shared across the wider website.

Using dates vs topics as subdirectories

Although dates are acceptable in URLs under select circumstances, most of the time using dates in URLs isn’t the reight choice. The expection to using dates is when the content on a landing page might be time-bound and searched using keywords incorporating dates, thus making the dates a relevancy signal for search engines about the landing page. However, the use of dates in other cases is generally an inferior way of structuring content as the a published date provide limited context to search engines or users about the semantics of the content behind the URL.

If you have your dates post displayed in the URL as subdirectories by deafult, it’s likely because of your Permalinks structure. Your Permalinks structure allows you to set how your URLs in WordPress will be structured.

To change your Permalinks settings in WordPress navigate to “Settings”, then click “Permalinks” which will open the “Permalinks Settings‘” window and set the Permalinks Structure to “Post name”. This way you will have search engine optimised URLs by default.

Using a sitemap

A sitemap can be as simple as a full list of a website’s public URLs. A sitemap’s purpose is to inform search engines about what’s available for crawling on a website, ensuring that all pages including those that may seem difficult to access for search bots are indeed properly crawled and indexed. It’s worth emphasising that a sitemap only lists the public pages of a website, thus excluding any pages that are behind the “log-in window” or which should not be made available to the public directly through a search engine.

Sitemaps are particularly useful for large websites that are updated frequently and thus are prone to delays in getting all pages crawled and indexed, websites that contain sections which are not very well linked together or websites with few inbound external links. Historically, several types of sitemaps were used, including HTML sitemaps, XML sitemaps, RSS feeds and text files, with XML sitemaps becoming on the rise over the last decade.

An XML sitemap can contain a maximum of 50,000 URLs and weigh no more than 50MB, numbers which may seem limiting to large websites. This is why sitemaps can be placed under a sitemap index under similar restrictions – each containing no more than 50,000 sitemaps and being under 50MB in size, but it’s worth noting you can have multiple sitemap indexes if your website contains more the 50,000 * 50,000 URLs, which is very unlikely.

For websites that have multiple alternate versions for different languages and/or regions, one may employ hreflang tags to specify the language and/or locale each URL is targeted toward, directly within sitemaps. This can be employed even for websites that extend to multiple domains or top-level domains (TLDs). In order for hreflang tags to work, each URL entry in the sitemaps must list all of its alternate versions, in a manner that forces every alternate version of a URL to point back to all of its other alternate versions.

This ensures that the ownership of all domains and their respective pages (and the consolidation of ranking signals among these pages) isn’t put into question by such cases when random websites declare their pages as being alternate to your own, with the intent of consolidating the ranking signals to their pages at your expense.

Keeping all the alternate versions of URLs up to date in sitemaps may become overwhelming, so in cases when it comes down to prioritising which versions to update first, the ones being served to users with different languages will come ahead of the ones being served to users from different regions. In other words, it’s more important to have your alternate tags for the “en” version set up to point to its “fr” or “de” versions, as opposed to its “en-gb” or “en-us” versions.

Similarly, search engines are not accustomed to deriving a web page’s language based on its set geographical region, but learned to associate a language with all geographical regions it is spoken in. In other words, if you’re aiming for simplified hreflang implementation in sitemaps, you may use the language code alone to target all the speakers of that language but have the option to restrict the targeting to certain geographical areas through a country code.

Big or small, an international website would also benefit from an x-default tag applied to the hreflang section in sitemaps that specifies the version of a URL to serve to those users whose browser settings are set to a language that is not specified through any of your alternate versions. This ensures that all users speaking a language other than one supported by the website will be directed to a “main” version of the page, which may be more suitable than any of the language-specific pages.

In addition to the option of using sitemaps for specifying alternate language versions of URLs, one may also employ sitemap extensions to list the locations of media files, including images, videos and other content which search engine crawlers may find hard to parse. Although optional, XML sitemaps allow for additional data to be included about the URLs, which may help ensure the website is crawled more effectively.

This includes such aspects as when the URL was last updated, how often it changes, and its relative importance to other URLs on the website. Similarly, sitemaps often work hand in hand with robots.txt files that serve exactly the opposite purpose – ensuring all private URLs are not crawled and/or indexed by search engines.