Table of Contents
Defining URL Optimisation
A URL (Uniform Resource Locator) is the unique web address used to access specific resources from internet-connected websites. It instructs browsers on how and where to find a website page or resource (i.e. an image).
The Role of URLs in SEO:
- Rely on keywords to inform search engines about the content of the landing page
- Can provide context on the website structure and the landing page’s position in it
- Can outline if a website section is isolated from the main domain
- Can communicate about the language of a page and geographical region of a website
- Specify if the website uses a secure, encrypted connection to protect visitor information
URL Optimisation is the optimisation of every website URL, the relationship between the URLs and domain-wide aspects that directly impact all website URLs, as percieved by search engines. It also references the complimentary technical SEO aspects that influence how URLs are processed by search engines.
Disecting URL Structure
Put simply, URL Optimisation as a whole can be predictably managed using a disected approach to optimising every aspect of any URL. Domain-related URL optimisation aspects are carried out once across all website URLs, while URL Path optimisation requires granular input. To optimise all URLs head to toe, let’s first disect the URL Structure, outline the characteristics of each URL section and follow-up with the optimisation for each one.
The protocol [httphttps://wordprexeo.com] specifies the manner in which a browser is expected to retrieve the information from a resource.
The web standards are HTTP (Hypertext Transfer Protocol) and HTTPS (Hypertext Transfer Protocol Secure). The notable difference between the two is that HTTPS utilises an SSL certificate to encrypt data, ensuring that sensitive information remains accessible only to the website owner and the user while preventing interception by hackers or malicious software and blocks unauthorised advertising injections rom third-parties, like free Wi-Fi networks.
As a result of its enhanced security, HTTPS websites get a lock icon in the browser window alerting users that their information is being protected, giving just a small additional level of credibility to the website. For these reasons, it is currently believed that major search engines have a strong preference for HTTPS.
Put simply, to get the most out of the protocol section of the URL in terms of its contribution to higher search engine rankings is to set your website up with HTTPS encryption.
A subdomain [https://seockecker.wordprexeo.com] is a prefix to the root domain used to isolate specific sections of a website. Because search engines treat subdomains as independent entities, their authority and ranking signals are not naturally shared with the main website.
Primary Use Cases for Subdomains
- Scope Isolation: Sections clearly outside the main website’s topical focus (e.g. forums, career portals, or distinct product lines).
- Risk Management: Testing new designs, UX, or content without impacting the organic performance of the root domain.
- User Experience: Delimiting international versions from the primary website.
Subdomains vs Subdirectories Best Practice
The choice between these two structures is a strategic decision regarding entity independence and the distribution of “link equity”.
| Feature | Subdirectory | Subdomain |
|---|---|---|
| URL Example | wordprexeo.com/blog/ | blog.wordprexeo.com/ |
| Search Engine Perception | Seen as a structural part of the main website entity. | Treated as a distinct, independent entity from the root domain. |
| Authority and Trust | Inherits the root domain’s existing authority and trust signals. | Must build its own independent authority and trust. |
| SEO Strategy | Standard choice for content that supports the main topical focus. | Reserved for sections serving a different purpose or audience. |
| Technical Benefit | Consolidates “link equity” and relevancy signals across the website. | Useful for testing new UX/designs without risking the main website’s atuhority and trust. |
The Canonical Domain (Preferred Domain) [https://wwwwordprexeo.com] specifies the preference for the protocol and www or www version of the Domain to be enforced across all website URLs. Although there is no set preference by Search Engines for either www or www, only one single version must be used across all the URLs on the website for consistency purposes.
Whatever option you choose:
- All URLs must be normalised to the preferred version, including those referenced in the XML sitemap
- All internal links must point straight to the canonical version
- All other URL versions using the non-Canonical Domain must use 301 Permanent Redirects to the URL version with the Canonical Domain as a fail-proof way to prevent canonical-domain issues in the form of:
Enforcing the use of the Canonical Domain in all URLs, internal links and XML sitemaps along with 301 Permanent Redirects to Canonical-domain alternatives will fix the following historical issues related to accidental non-Canonical domain use on and outstide the website and prevent them from happening moving forward:
- Broken links
- Broken pages
- Duplicates content
- Split authority signals
Canonical Domain Set-up is part of the standard process for Setting up Basic WordPress SEO Settings.
A domain name [https://wordprexeo.com] is the snippet of text in between the preferred domain (or subdomain) and the top-level domain and is most commonly associated with the brand name. The domain name is the part of the URL that is most difficult to change, hence the importance of choosing a domain name that is meant to be fit for purpose, as part of Domain Actuation.
Exact-Match Domain Names
Although historically, the domain name could have a drastic influence on SEO through the use of so-called exact match domains (EMD) targeting of keywords, this was short-lived. In fact, search engines have developed algorithms that would address the injustice caused by exact match domains (EMD) selected merely for their keywords, thus diminishing the power of websites to rank for keywords based on domain names.
It’s worth noting, however, that while EMD algorithms have taken the power away from exact match domains, it has placed a strong emphasis on the branding factor to replace it. This has shifted the domain name game to the simple practice of selecting the right brand names, that make sense, are distinguishable in SERPs, and memorable to users.
Domain Names Best Practice
Although the choice of a domain name should be mainly driven by the branding factor, all things being equal, shorter domain names should be preferred to longer ones. As a future-proof practice, it’s also a good bet to avoid hyphens [-]. Domain names are unlike any other segments of a URL in their capacity to encourage repeat visits from SERPs. It is not an overstatement to say users rely heavily on domain names in making judgements on the credibility of information and the repeat click-throughs to the website.
A Top-level domain (TLD) [https://wordprexeo.com], also known as a domain suffix refers to the segment that follows the domain name, separated by the dot [.] and is the last URL element ahead of the URL path. Top-Level Domains are tied and selected together with the Domain Name and are explored as part of Domain Actuation.
There are currently over a thousand registered top-level domains, with nearly half the registered domains using the [.com] extension. The choice of which top-level domain to use will depend on the specifics of the website.
Top-level domains can be either generic such as [.com] for commercial businesses and [.org] for organisations or country-specific, such as [.co.uk] for the United Kingdom and [.de] for Germany. Some top-level domains can fall under multiple segments, such as [.ac.uk] for universities in the United Kingdom.
It’s worth noting that as new top-level domains are released, it provides the opportunity to choose domain names that were previously occupied on existing top-level domains, particularly those incorporating important keywords. However, this should be approached with caution as less-popular top-level domains can be associated with spammy websites.
Additionally, when choosing a Domain Name on a less popular top-level domain for the reason of it being occupied on other top-level domains, one may run into difficulties with branded search. Ensuring one can access your website by simply typing your brand name in the search box is far more important than having a catchy domain. It is therefore generally recommended to use the more established top-level domains unless an alternative one provides some brand value that the ordinary ones do not.
In the case of websites that are tied geographically, it is wise to use an appropriate domain suffix to indicate that. Not only will this be used by users when deciding on a website and which generally prefer local domains, but also by search engines that understand the differences in top-level domains.
Some international websites targeting multiple geographical areas and languages take this one step further by using one domain name spanning across multiple top-level domains, with websites being interlinked through hreflang tags that identify them as belonging to the same entity. This is especially common for large e-commerce websites that target distinct geographical markets using different languages.
The URL Path [https://wordprexeo.com/subdirectory/-subdirectory-2/landing-page/] is the fragment of the URL immediately following the top-level domain.
A URL path contains:
- None, one or several levels of Subdirectories, as part of URL Taxonomy.
- A Landing Page Slug, or
- A File, with respective extension. At present, file extensions are most commonly employed for images [.jpg or .png] and documents [.pdf].
- URL parameters for dynamic URLs, used to display dynamically-generated content, commonly powering the website search function, analytics tools, and advanced categorisation, filtering, and ordering. Dynamic URL
Based on the disection of the URL Structure – at its core, URL Optimisation can be split into:
- Domain Name Actuation: The segment of the URL containing the Protocol, optional Subdomain (including the Canonical Domain), the Domain, including the top-Level Domain, typically optimised once for the entire website.
- URL Taxonomy Optimisation: Optimising the categorisation of all URLs on the website with particular reference to Taxonomic Depth and Topic Clusters.
- Page URL Slug Optimisation: Optimising the fragment of URL that identifies one particular landing page and is largely limited to the effective use of keywords, as part of Onpage Optimisation.
- Dynamic URLs Optimisation: Neither Page URL Slugs Optimisation, nor URL Taxonomy Optimisation reflects on the use and optimisation of Dynamic URLs, a topic explored separately in SEO (if your website uses dynamic URLs).
Technical SEO Aspects that influence URLs
While the URL structure defines the hierarchy, external configurations are used to assist in discovery and regional targeting:
- Internal Links:
- XML Sitemaps: Serves as the authoritative list of your URLs, ensuring search engines can discover and crawl all website URLs available for Indexing.
- robots.txt file: The exact opposite of the XML sitemap, the robots.txt file tells search engines which parts of your URL taxonomy they are forbidden from crawling.
- Canonical Tags: A canonical tag is an HTML element used to specify the “primary” version of a URL to search engines when multiple versions of a page must exist. They are particularly essential for URLs generated with dynamic parameters or when similar content is accessible through multiple URL paths, ensuring that ranking signals are consolidated to a single authoritative URL.
- Hreflang Tags: For international structures, [Hreflang tags] are used within the URL metadata to signal the relationship between alternate versions of the same page intended for different languages or regions.
URL Structure Optimisation Best Practice
URL character length
There are a number of technical and practical considerations that make the length of a URL a topic of its own. From a technical point of view, URLs must be shorter than 2,083 characters in order to correctly render in all browsers.
URL display in SERPs
In order for a page URL to display fully in SERPs, as opposed to it being truncated, its length should be within the limits of 512 pixels. The actual number of characters will, of course, vary as some characters spread over more pixels than others. A general estimate for URLs to fully display in SERPs would fall at around 50 to 60 characters.
Use short URLs
It is generally advised to keep URLs as short as possible, providing just enough information to both users and search engines to understand the content of the landing page and its structural context in relation to the rest of the website.
It’s particularly important to keep the subdirectories short, as a subdirectory with a longer slug will directly result in all the landing pages nested under it also having longer URLs [htts://wordprexeo.com/a-very-long-subdirectory-sluglong-slug/landing-page/]
As a general rule, it’s recommended to avoid using superfluous words that do not add immediate value to the content, like prepositions and conjunctions [https://wordprexeo.com/onpage-optimisation/optimisation-of-imagesimage-optimisation/].
It can also be achieved by avoiding the repetition of keywords that are already present in the categories and subcategories of the URL path [https://wordprexeo.com/onapge-optimisation/url-optimisation/url-structure-optimisation/].
Keep the length of any given subdirectory slug or page slug within three words, otherwise it becomes unintelligible to users who are less likely to rely on it to make a decision about the context of the landing page it represents.
There is a negative correlation between the length of a URL and organic search rankings, meaning that all things being equal, shorter URLs are likely to rank higher in organic search results.
The Protocol, Subdomains, Domain Name and Top-Level Domain are all case-insenstive and will display in lowercase in both browsers and search engines, no matter how they’re specified on the back-end.
URLs Paths are case-sensitive, meaning a URL Path with one uppercase letter is a different URL than exactly the same URL with the same letter in lowercase. Despite uppercase letters generally being allowed in URLs, WordPress applies certain URL sanitisation rules, which will automatically convert all uppercase letters to lowercase to prevent duplicate issues caused by case-sensitivity being handled differently across systems.
The use of non-English characters in the URLs of non-English websites is allowed and in case of URL Paths, encouraged.
It may be worth noting that international characters in Domain Names is rarely employed. The best practice for international characters in Domain Names is to limit the characters to a single language to prevent confusion.
Even though non-English characters require encoding, browsers and search engines typically display international characters as decoded. Also, most search engines can recognise, crawl and rank web pages based on the keywords containing such international characters. So, if one expects users to search for keywords containing non-English characters, it is only normal to make appropriate use of them in URLs.
Lastly, it may be worth noting the WordPress URL sanitisation rules will automatically convert latin alphabet characters with diacritics [á] into the standard English characters without diacritics [a]. At the same time, it will automatically use appropriate encoding for the characters from all other alphabets. The encoded characters are typically decoded and displayed normally in both browsers and search engines and their encoded URL segments are only revealed when copied to clipboard.
Hyphen is the only non alpha-numeric character supported in Domain Names, but it’s use is generally discouraged, unless absolutely necessary.
URL Paths support both, Hyphens [-] and Underscores, however the hyphen [-] is now the only recommended option for filling-in space between words. The reason behind this is that at the moment the major search engines are programmed to read the hyphens [-] as word separators, while the underscores [_] as a means of combining words. The use of underscores [_], however, is now discouraged altogether.
When it comes to URLs, not all characters are born equal. It begins with a group called Unreserved Characters. The most commonly used characters in URLs are alphanumerics (a-z and 0-9). There’s also a small number of additional special characters including the hyphen [-] used to separate words, the underscore [_] traditionally used to combine words, the dot [.] and the almost equals sign [~]. These symbols also do not require encoding, which means that all URLs written using these characters will appear in browsers exactly as they are written on the backend.
On the other end, there are reserved characters, which possess a reserved technical purpose within a URL. This is most commonly found in the practice of automatically-generated URLs. In other words, their appearance in a specific part of the URL has the technical capacity to change its semantics. The full list of reserved characters include: ! * ‘ ( ) ; : @ & = + $ , / ? % # [ ].
Reserved characters can be used without encoding when employed for their reserved purpose. Although they can also be outside their reserved purpose (with encoding), WordPress URL Sanitisation usually strips these characters from URLs. In other words, if your URLs contain these characters, WordPress will automaticall remove them.
The Forward Slash [/] reserved purpose
The character with arguably the most important reserved purpose in URLs is the forward-slash [/], used on subdirectories [/subdirectory/]. In simple terms, forward slashes [/] enable all URLs to be part of a hierarchy, by delimiting separate entities on a website from each other. An entity that is contained between 2 forward slashes [/] in a URL is called a Subdirectory [/subdirectory/], while the last entity that is contained within one or multiple Subdirectories is called a Landing Page.
Some subdirectories can employ the role of landing pages. In other words, while the URL subdirectory [/subdirectory/] serves as the portion of the URL that precedes the portion of the landing page [/sub..ry/landing-page-1/], it may also host a langing page of its own under that Subdirectory slug.
Technical aspects aside, the notable difference between a subdirectory and a landing page is that Subdirectories are typically employed to store multiple landing pages, the content of which have related semantic meaning.
The hash sign [#] reserved purpose
An important feature in URLs is triggered by the use of hashes [#], which normally indicates not only the location of the page but also a specific location within the page itself, such as a header or the beginning of a paragraph on that page. This is particularly useful for information-heavy websites, which may require to accommodate hyperlinks to specific paragraphs or sections of text as opposed to the entire landing page. For this reason, the use of hashes can only be used for this purpose and this purpose alone.
The content of such URLs with hashes [#] in them is not indexed, as the content is considered to already be part of the larger web page (held by the URL without the hash [#]). This prevents duplication issues from happening but also means this function must be used according to its intended purpose of showing a location within a page and by no means in any other way.
URLs can not contain blank spaces. More specifically, WordPress URL Sanitisation automatically converts the blank spaces entered in the URL slug field into hyphens [-].
Historically, the presence or absence of a trailing slash [/] at the end of a URL was a structural signal. The trailing slash [https://wordprexeo.com/subdirectory/] used to indicate a subdirectory, while its absence [https://wordprexeo.com/subdirectory/landing-page/] signalled a file or landing page.
While, this practice is no longer widely-employed, websites using both versions of URLs (with and without trailing slashes) may be subject to duplication issues. Issues are caused by the same URL being used with and without the trailing slash and search engines may treat the two versions as two distinct URLs.
Normalising the use of the trailing slash across all URLs will make it impossible for such issues to arise. In WordPress, on Pages – the trailing slash is enforced automatically and on Posts is enforced when the appropriate option is selected in the Permalinks Settings, as part of URL Taxonomy Optimisation.