java.lang.Objectjava.net.URI
All Implemented Interfaces:
Comparable, java$io$Serializable
Aside from some minor deviations noted below, an instance of this class represents a URI reference as defined by RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax, amended by RFC 2732: Format for Literal IPv6 Addresses in URLs. The Literal IPv6 address format also supports scope_ids. The syntax and usage of scope_ids is described here. This class provides constructors for creating URI instances from their components or by parsing their string forms, methods for accessing the various components of an instance, and methods for normalizing, resolving, and relativizing URI instances. Instances of this class are immutable.
[scheme:]scheme-specific-part[#fragment]where square brackets [...] delineate optional components and the characters : and # stand for themselves.
An absolute URI specifies a scheme; a URI that is not absolute is said to be relative. URIs are also classified according to whether they are opaque or hierarchical.
An opaque URI is an absolute URI whose scheme-specific part does not begin with a slash character ('/'). Opaque URIs are not subject to further parsing. Some examples of opaque URIs are:
mailto:java-net@java.sun.com news:comp.lang.java urn:isbn:096139210x
A hierarchical URI is either an absolute URI whose scheme-specific part begins with a slash character, or a relative URI, that is, a URI that does not specify a scheme. Some examples of hierarchical URIs are:
http://java.sun.com/j2se/1.3/
docs/guide/collections/designfaq.html#28
../../../demo/jfc/SwingSet2/src/SwingSet2.java
file:///~/calendar
A hierarchical URI is subject to further parsing according to the syntax
[scheme:][//authority][path][?query][#fragment]where the characters :, /, ?, and # stand for themselves. The scheme-specific part of a hierarchical URI consists of the characters between the scheme and fragment components.
The authority component of a hierarchical URI is, if specified, either server-based or registry-based. A server-based authority parses according to the familiar syntax
[user-info@]host[:port]where the characters @ and : stand for themselves. Nearly all URI schemes currently in use are server-based. An authority component that does not parse in this way is considered to be registry-based.
The path component of a hierarchical URI is itself said to be absolute if it begins with a slash character ('/'); otherwise it is relative. The path of a hierarchical URI that is either absolute or specifies an authority is always absolute.
All told, then, a URI instance has the following nine components:
In a given instance any particular component is either undefined or defined with a distinct value. Undefined string components are represented by null, while undefined integer components are represented by -1. A string component may be defined to have the empty string as its value; this is not equivalent to that component being undefined.
Component Type scheme String scheme-specific-part String authority String user-info String host String port int path String query String fragment String
Whether a particular component is or is not defined in an instance depends upon the type of the URI being represented. An absolute URI has a scheme component. An opaque URI has a scheme, a scheme-specific part, and possibly a fragment, but has no other components. A hierarchical URI always has a path (though it may be empty) and a scheme-specific-part (which at least contains the path), and may have any of the other components. If the authority component is present and is server-based then the host component will be defined and the user-information and port components may be defined.
Normalization is the process of removing unnecessary "." and ".." segments from the path component of a hierarchical URI. Each "." segment is simply removed. A ".." segment is removed only if it is preceded by a non-".." segment. Normalization has no effect upon opaque URIs.
Resolution is the process of resolving one URI against another, base URI. The resulting URI is constructed from components of both URIs in the manner specified by RFC 2396, taking components from the base URI for those not specified in the original. For hierarchical URIs, the path of the original is resolved against the path of the base and then normalized. The result, for example, of resolving
docs/guide/collections/designfaq.html#28 (1)against the base URI http://java.sun.com/j2se/1.3/ is the result URI
http://java.sun.com/j2se/1.3/docs/guide/collections/designfaq.html#28Resolving the relative URI
../../../demo/jfc/SwingSet2/src/SwingSet2.java (2)against this result yields, in turn,
http://java.sun.com/j2se/1.3/demo/jfc/SwingSet2/src/SwingSet2.javaResolution of both absolute and relative URIs, and of both absolute and relative paths in the case of hierarchical URIs, is supported. Resolving the URI file:///~calendar against any other URI simply yields the original URI, since it is absolute. Resolving the relative URI (2) above against the relative base URI (1) yields the normalized, but still relative, URI
demo/jfc/SwingSet2/src/SwingSet2.java
Relativization, finally, is the inverse of resolution: For any two normalized URIs u and v,
u.relativize(u.resolve(v)).equals(v) andThis operation is often useful when constructing a document containing URIs that must be made relative to the base URI of the document wherever possible. For example, relativizing the URI
u.resolve(u.relativize(v)).equals(v) .
http://java.sun.com/j2se/1.3/docs/guide/index.htmlagainst the base URI
http://java.sun.com/j2se/1.3yields the relative URI docs/guide/index.html.
alpha The US-ASCII alphabetic characters, 'A' through 'Z' and 'a' through 'z' digit The US-ASCII decimal digit characters, '0' through '9' alphanum All alpha and digit characters unreserved All alphanum characters together with those in the string "_-!.~'()*" punct The characters in the string ",;:$&+=" reserved All punct characters together with those in the string "?/[]@" escaped Escaped octets, that is, triplets consisting of the percent character ('%') followed by two hexadecimal digits ('0'-'9', 'A'-'F', and 'a'-'f') other The Unicode characters that are not in the US-ASCII character set, are not control characters (according to the Character.isISOControl method), and are not space characters (according to the Character.isSpaceChar method) (Deviation from RFC 2396, which is limited to US-ASCII)
The set of all legal URI characters consists of the unreserved, reserved, escaped, and other characters.
To encode non-US-ASCII characters when a URI is required to conform strictly to RFC 2396 by not containing any other characters.
To quote characters that are otherwise illegal in a component. The user-info, path, query, and fragment components differ slightly in terms of which characters are considered legal and illegal.
A character is encoded by replacing it with the sequence of escaped octets that represent that character in the UTF-8 character set. The Euro currency symbol ('\u20AC'), for example, is encoded as "%E2%82%AC". (Deviation from RFC 2396, which does not specify any particular character set.)
An illegal character is quoted simply by encoding it. The space character, for example, is quoted by replacing it with "%20". UTF-8 contains US-ASCII, hence for US-ASCII characters this transformation has exactly the effect required by RFC 2396.
A sequence of escaped octets is decoded by replacing it with the sequence of characters that it represents in the UTF-8 character set. UTF-8 contains US-ASCII, hence decoding has the effect of de-quoting any quoted US-ASCII characters as well as that of decoding any encoded non-US-ASCII characters. If a decoding error occurs when decoding the escaped octets then the erroneous octets are replaced by '\uFFFD', the Unicode replacement character.
The single-argument
constructor
requires any illegal characters in its argument to be
quoted and preserves any escaped octets and other characters that
are present.
The multi-argument constructors
quote illegal characters as
required by the components in which they appear. The percent character
('%') is always quoted by these constructors. Any other
characters are preserved.
The getRawUserInfo , getRawPath , getRawQuery , getRawFragment , getRawAuthority , and getRawSchemeSpecificPart methods return the values of their corresponding components in raw form, without interpreting any escaped octets. The strings returned by these methods may contain both escaped octets and other characters, and will not contain any illegal characters.
The getUserInfo , getPath , getQuery , getFragment , getAuthority , and getSchemeSpecificPart methods decode any escaped octets in their corresponding components. The strings returned by these methods may contain both other characters and illegal characters, and will not contain any escaped octets.
The toString method returns a URI string with all necessary quotation but which may contain other characters.
The toASCIIString method returns a fully quoted and encoded URI string that does not contain any other characters.
new URI(u.toString()).equals(u) .For any URI u that does not contain redundant syntax such as two slashes before an empty authority (as in file:///tmp/ ) or a colon following a host name but no port (as in http://java.sun.com: ), and that does not encode characters except those that must be quoted, the following identities also hold:
new URI(u.getScheme(),in all cases,
u.getSchemeSpecificPart(),
u.getFragment())
.equals(u)
new URI(u.getScheme(),if u is hierarchical, and
u.getUserInfo(), u.getAuthority(),
u.getPath(), u.getQuery(),
u.getFragment())
.equals(u)
new URI(u.getScheme(),if u is hierarchical and has either no authority or a server-based authority.
u.getUserInfo(), u.getHost(), u.getPort(),
u.getPath(), u.getQuery(),
u.getFragment())
.equals(u)
The conceptual distinction between URIs and URLs is reflected in the differences between this class and the URL class.
An instance of this class represents a URI reference in the syntactic sense defined by RFC 2396. A URI may be either absolute or relative. A URI string is parsed according to the generic syntax without regard to the scheme, if any, that it specifies. No lookup of the host, if any, is performed, and no scheme-dependent stream handler is constructed. Equality, hashing, and comparison are defined strictly in terms of the character content of the instance. In other words, a URI instance is little more than a structured string that supports the syntactic, scheme-independent operations of comparison, normalization, resolution, and relativization.
An instance of the URL class, by contrast, represents the syntactic components of a URL together with some of the information required to access the resource that it describes. A URL must be absolute, that is, it must always specify a scheme. A URL string is parsed according to its scheme. A stream handler is always established for a URL, and in fact it is impossible to create a URL instance for a scheme for which no handler is available. Equality and hashing depend upon both the scheme and the Internet address of the host, if any; comparison is not defined. In other words, a URL is a structured string that supports the syntactic operation of resolution as well as the network I/O operations of looking up the host and opening a connection to the specified resource.
Mark
- Reinhold1.4
- Field Summary | ||
---|---|---|
static final long | serialVersionUID |
Constructor: |
---|
This constructor parses the given string exactly as specified by the grammar in RFC 2396, Appendix A, except for the following deviations:
|
A component may be left undefined by passing null. This constructor first builds a URI in string form using the given components as follows: The resulting URI string is then parsed in order to create the new URI instance as if by invoking the #URI(String) constructor; this may cause a URISyntaxException to be thrown.
|
A component may be left undefined by passing null. This convenience constructor works as if by invoking the seven-argument constructor as follows: new String, String, int, String, String, String) URI (scheme, null, host, -1, path, null, fragment);
|
If a scheme is given then the path, if also given, must either be empty or begin with a slash character ('/'). Otherwise a component of the new URI may be left undefined by passing null for the corresponding parameter. This constructor first builds a URI string from the given components according to the rules specified in RFC 2396, section 5.2, step 7: The resulting URI string is then parsed as if by invoking the #URI(String) constructor and then invoking the #parseServerAuthority() method upon the result; this may cause a URISyntaxException to be thrown.
|
If a scheme is given then the path, if also given, must either be empty or begin with a slash character ('/'). Otherwise a component of the new URI may be left undefined by passing null for the corresponding parameter or, in the case of the port parameter, by passing -1. This constructor first builds a URI string from the given components according to the rules specified in RFC 2396, section 5.2, step 7: The resulting URI string is then parsed as if by invoking the #URI(String) constructor and then invoking the #parseServerAuthority() method upon the result; this may cause a URISyntaxException to be thrown.
|
Method from java.net.URI Summary: |
---|
compareTo, create, equals, getAuthority, getFragment, getHost, getPath, getPort, getQuery, getRawAuthority, getRawFragment, getRawPath, getRawQuery, getRawSchemeSpecificPart, getRawUserInfo, getScheme, getSchemeSpecificPart, getUserInfo, hashCode, isAbsolute, isOpaque, normalize, parseServerAuthority, relativize, resolve, resolve, toASCIIString, toString, toURL |
Methods from java.lang.Object: |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method from java.net.URI Detail: |
---|
When comparing corresponding components of two URIs, if one component is undefined but the other is defined then the first is considered to be less than the second. Unless otherwise noted, string components are ordered according to their natural, case-sensitive ordering as defined by the String.compareTo method. String components that are subject to encoding are compared by comparing their raw forms rather than their encoded forms. The ordering of URIs is defined as follows: This method satisfies the general contract of the Comparable.compareTo method. |
This convenience factory method works as if by invoking the #URI(String) constructor; any URISyntaxException thrown by the constructor is caught and wrapped in a new IllegalArgumentException object, which is then thrown. This method is provided for use in situations where it is known that the given string is a legal URI, for example for URI constants declared within in a program, and so it would be considered a programming error for the string not to parse as such. The constructors, which throw URISyntaxException directly, should be used situations where a URI is being constructed from user input or from some other source that may be prone to errors. |
If the given object is not a URI then this method immediately returns false. For two URIs to be considered equal requires that either both are opaque or both are hierarchical. Their schemes must either both be undefined or else be equal without regard to case. Their fragments must either both be undefined or else be equal. For two opaque URIs to be considered equal, their scheme-specific parts must be equal. For two hierarchical URIs to be considered equal, their paths must be equal and their queries must either both be undefined or else be equal. Their authorities must either both be undefined, or both be registry-based, or both be server-based. If their authorities are defined and are registry-based, then they must be equal. If their authorities are defined and are server-based, then their hosts must be equal without regard to case, their port numbers must be equal, and their user-information components must be equal. When testing the user-information, path, query, fragment, authority, or scheme-specific parts of two URIs for equality, the raw forms rather than the encoded forms of these components are compared and the hexadecimal digits of escaped octets are compared without regard to case. This method satisfies the general contract of the Object.equals method. |
The string returned by this method is equal to that returned by the getRawAuthority method except that all sequences of escaped octets are decoded. |
The string returned by this method is equal to that returned by the getRawFragment method except that all sequences of escaped octets are decoded. |
The host component of a URI, if defined, will have one of the following forms: |
The string returned by this method is equal to that returned by the getRawPath method except that all sequences of escaped octets are decoded. |
The port component of a URI, if defined, is a non-negative integer. |
The string returned by this method is equal to that returned by the getRawQuery method except that all sequences of escaped octets are decoded. |
The authority component of a URI, if defined, only contains the commercial-at character ('@') and characters in the unreserved, punct, escaped, and other categories. If the authority is server-based then it is further constrained to have valid user-information, host, and port components. |
The fragment component of a URI, if defined, only contains legal URI characters. |
The path component of a URI, if defined, only contains the slash character ('/'), the commercial-at character ('@'), and characters in the unreserved, punct, escaped, and other categories. |
The query component of a URI, if defined, only contains legal URI characters. |
The scheme-specific part of a URI only contains legal URI characters. |
The user-information component of a URI, if defined, only contains characters in the unreserved, punct, escaped, and other categories. |
The scheme component of a URI, if defined, only contains characters in the alphanum category and in the string "-.+". A scheme always starts with an alpha character. The scheme component of a URI cannot contain escaped octets, hence this method does not perform any decoding. |
The string returned by this method is equal to that returned by the getRawSchemeSpecificPart method except that all sequences of escaped octets are decoded. |
The string returned by this method is equal to that returned by the getRawUserInfo method except that all sequences of escaped octets are decoded. |
|
A URI is absolute if, and only if, it has a scheme component. |
A URI is opaque if, and only if, it is absolute and its scheme-specific part does not begin with a slash character ('/'). An opaque URI has a scheme, a scheme-specific part, and possibly a fragment; all other components are undefined. |
If this URI is opaque, or if its path is already in normal form, then this URI is returned. Otherwise a new URI is constructed that is identical to this URI except that its path is computed by normalizing this URI's path in a manner consistent with RFC 2396, section 5.2, step 6, sub-steps c through f; that is: A normalized path will begin with one or more ".." segments if there were insufficient non-".." segments preceding them to allow their removal. A normalized path will begin with a "." segment if one was inserted by step 3 above. Otherwise, a normalized path will not contain any "." or ".." segments. |
If this URI's authority component has already been recognized as being server-based then it will already have been parsed into user-information, host, and port components. In this case, or if this URI has no authority component, this method simply returns this URI. Otherwise this method attempts once more to parse the authority component into user-information, host, and port components, and throws an exception describing why the authority component could not be parsed in that way. This method is provided because the generic URI syntax specified in RFC 2396 cannot always distinguish a malformed server-based authority from a legitimate registry-based authority. It must therefore treat some instances of the former as instances of the latter. The authority component in the URI string "//foo:bar", for example, is not a legal server-based authority but it is legal as a registry-based authority. In many common situations, for example when working URIs that are known to be either URNs or URLs, the hierarchical URIs being used will always be server-based. They therefore must either be parsed as such or treated as an error. In these cases a statement such as URI u = new URI(str).parseServerAuthority(); can be used to ensure that u always refers to a URI that, if it has an authority component, has a server-based authority with proper user-information, host, and port components. Invoking this method also ensures that if the authority could not be parsed in that way then an appropriate diagnostic message can be issued based upon the exception that is thrown. |
The relativization of the given URI against this URI is computed as follows: |
If the given URI is already absolute, or if this URI is opaque, then the given URI is returned. If the given URI's fragment component is defined, its path component is empty, and its scheme, authority, and query components are undefined, then a URI with the given fragment but with all other components equal to those of this URI is returned. This allows a URI representing a standalone fragment reference, such as "#foo", to be usefully resolved against a base URI. Otherwise this method constructs a new hierarchical URI in a manner consistent with RFC 2396, section 5.2; that is: The result of this method is absolute if, and only if, either this URI is absolute or the given URI is absolute. |
|
If this URI does not contain any characters in the other category then an invocation of this method will return the same value as an invocation of the toString method. Otherwise this method works as if by invoking that method and then encoding the result. |
If this URI was created by invoking one of the constructors in this class then a string equivalent to the original input string, or to the string computed from the originally-given components, as appropriate, is returned. Otherwise this URI was created by normalization, resolution, or relativization, and so a string is constructed from this URI's components according to the rules specified in RFC 2396, section 5.2, step 7. |
This convenience method works as if invoking it were equivalent to evaluating the expression new URL(this.toString()) after first checking that this URI is absolute. |