A URI is a sequence of characters that identifies a resource on the Internet.
URIs provide a simple and extensible means to identify and reference web documents, applications, files, images, downloads, media streams, and other resources over the Internet.
When a resource is made available on a network, it is assigned a URI so that it can be referred to unambiguously.
URIs form the basis for web technology by providing consistent and interoperable addresses for locating web-based information and services.
Understanding the structure, syntax, and usage of URIs is essential for building and using networked applications and hypertext documents.
What is a URI?
A URI is a string that identifies a resource on a network using a specific syntax.
URIs may refer to documents, images, files, services, web pages, or other resources.
The most common type of URI is a Uniform Resource Locator (URL), which identifies a resource and provides the means to locate it on the Internet.
However, other types of URIs do not necessarily imply retrievability of the referenced resource.
URIs comprise several components including a naming scheme, authority, path, query, and fragment. The URI syntax provides a standard way for programs to parse URIs and reliably extract these parts.
Uniform Resource Identifiers (URIs) vs URLs vs URNs
There is some confusion around the terms URI, URL, and URN:
- URI – A generic term for any type of identifier used to reference a resource on a network.
- URL – A specific type of URI that identifies a resource and provides the means to locate it using its network address.
- URN – A specific type of URI that names and identifies a resource by a persistent name in a namespace but may not be retrievable.
So Overall, All URLs are URIs (but not vice versa) and All URNs are URIs (but not vice versa)
How URIs Work?
When a web browser or application needs to access a resource over the network using its URI, here is the general sequence:
- The client extracts the scheme/protocol from the URI, which defines how to access it.
- Based on the protocol, the appropriate network resolver is engaged. This looks up the authority/host IP address.
- The connection is established to the host server using the determined IP address and optional port.
- The path and query string are sent and interpreted by the server to locate or generate the requested resource.
- The server returns the resource data to the client over the open connection.
By providing a consistent syntactic standard, URIs enable any client to access any resource using its identifier.
The networking layers handle establishing the connection, allowing URIs to function independently from the underlying implementation.
When to Use a URI?
URIs serve several important functions:
- Locating documents and resources – Enables retrieving them over the network via protocols like HTTP.
- Linking between documents – URIs allow hyperlinks to reference other documents or portions within them.
- Marking content ownership – By associating a URI with a resource, its ownership and identity can be established.
- Persisting state information – Web applications use query strings and hashes to maintain state in URIs across requests.
- Identity comparison – URIs provide a standard way to determine if two resources are identical.
- Accessing web services – APIs expose operations callable via standardized URI conventions.
URI Resolving Process
When a program or user needs to access a resource via its URI, the URI resolving process is:
- Parse the URI syntax to extract the scheme, authority, path, etc.
- Pass the URI scheme to the appropriate resolver/protocol handler to look up the host IP address.
- Establish the network connection to the host IP and port.
- Send the URI path and query string to the server to retrieve or process the resources.
- Return the resource data over the open connection.
- The client receives and processes the resource as required by the application.
This resolution process enables seamlessly accessing resources over the network based on the URI address using appropriate networking protocols.
Because URIs consist of textual characters, certain characters need to be percent-encoded to avoid ambiguity and errors when transmitting the URIs over networks.
For example, spaces can be encoded as %20. Unsafe and reserved characters are translated into a ‘%’ followed by two hexadecimal digits representing that character’s code point value.
This encoding ensures the URI can be parsed unambiguously as a sequence of characters. Encoded characters are decoded back into the original data when processed.
Internationalized Resource Identifiers (IRIs)
IRIs extend URIs to handle encoding international characters outside the ASCII range. This allows non-English characters and symbols.
Because URIs are limited to ASCII, IRIs first encode Unicode characters into a format compatible with URI syntax.
This enables global languages while still being resolvable by existing URI infrastructure.
URIs provide a universal syntax for referencing resources on the web and reliably accessing them over networks.
The standard URI syntax enables linking, caching, state persistence, and other functions fundamental to working with interconnected documents and applications.
Understanding how to properly structure, encode, and resolve URIs is a core skill for developing on the web. As a foundational web technology, strong knowledge of URI behavior helps in building robust networked systems.
Frequently Asked Questions (FAQ)
Ques 1: What is the difference between a URI and a URL?
Ans: A URI is a generic identifier for any kind of resource, while a URL specifically identifies a resource and defines how to locate it on a network.
All URLs are URIs, but not all URIs are URLs – some URIs merely identify a resource without specifying a retrieval mechanism.
Ques 2: Do all URIs point to web pages?
Ans: No, URIs can identify and reference any kind of resource on a network – this includes documents, images, files, databases, web services, computer programs, or any other identifiable abstraction. Only a subset of URIs point specifically to web pages.
Ques 3: Can the same resource have more than one URI?
Ans: Yes, it is possible for a single resource to have multiple URIs that point to it.
For example, a website may allow access via both HTTP and HTTPS schemes, or via multiple domain names that resolve to the same website. Each would constitute a different URI for the same resource.
Ques 4: What is URI encoding and why is it used?
Ans: URI encoding translates characters into a representation using the ‘%’ symbol followed by two hexadecimal digits.
This safely encodes characters that may have special meaning in different contexts to avoid ambiguity when parsing URIs. Encoding reserved characters ensures URIs can be transmitted reliably across networks.
Ques 5: What is a URI resolver and how does it work?
Ans: A URI resolver is software capable of taking a URI and accessing the referenced resource over a network.
It parses the scheme and uses an appropriate protocol handler to look up the host, establish a connection, and request the resource specified in the remaining URI components.
Resolvers retrieve resources based on standardized URI syntax so users don’t have to understand the low-level details.