| MatchResult | The result of a match operation. | code | html |
| AbstractCharClass | This class represents character classes, i.e. | code | html |
| AbstractCharClass.LazyCharClass | code | html | |
| AbstractLineTerminator | Line terminator factory | code | html |
| AbstractSet | Basic class for nodes, representing given regular expression. | code | html |
| LeafSet | Base class for nodes representing leaf tokens of the RE, those who consumes fixed number of characters. | code | html |
| Pattern.BmpCharProperty | Optimized version of CharProperty that works only for properties never satisfied by Supplementary characters. | code | html |
| Pattern.CharProperty | Abstract node class to match one character satisfying some boolean property. | code | html |
| Pattern.CharPropertyNames.CharPropertyFactory | code | html | |
| Pattern.CharPropertyNames.CloneableProperty | code | html | |
| QuantifierSet | Base class for quantifiers. | code | html |
| SpecialToken | This is base class for special tokens like character classes and quantifiers. | code | html |
| UnicodeProp | code | html | |
| ASCII | Utility class that implements the standard C ctype functionality. | code | html |
| AbstractCharClass.LazyASCII | code | html | |
| AbstractCharClass.LazyAlnum | code | html | |
| AbstractCharClass.LazyAlpha | code | html | |
| AbstractCharClass.LazyBlank | code | html | |
| AbstractCharClass.LazyCategory | code | html | |
| AbstractCharClass.LazyCategoryScope | code | html | |
| AbstractCharClass.LazyCntrl | code | html | |
| AbstractCharClass.LazyDigit | code | html | |
| AbstractCharClass.LazyGraph | code | html | |
| AbstractCharClass.LazyJavaDefined | code | html | |
| AbstractCharClass.LazyJavaDigit | code | html | |
| AbstractCharClass.LazyJavaISOControl | code | html | |
| AbstractCharClass.LazyJavaIdentifierIgnorable | code | html | |
| AbstractCharClass.LazyJavaJavaIdentifierPart | code | html | |
| AbstractCharClass.LazyJavaJavaIdentifierStart | code | html | |
| AbstractCharClass.LazyJavaLetter | code | html | |
| AbstractCharClass.LazyJavaLetterOrDigit | code | html | |
| AbstractCharClass.LazyJavaLowerCase | code | html | |
| AbstractCharClass.LazyJavaMirrored | code | html | |
| AbstractCharClass.LazyJavaSpaceChar | code | html | |
| AbstractCharClass.LazyJavaTitleCase | code | html | |
| AbstractCharClass.LazyJavaUnicodeIdentifierPart | code | html | |
| AbstractCharClass.LazyJavaUnicodeIdentifierStart | code | html | |
| AbstractCharClass.LazyJavaUpperCase | code | html | |
| AbstractCharClass.LazyJavaWhitespace | code | html | |
| AbstractCharClass.LazyLower | code | html | |
| AbstractCharClass.LazyNonDigit | code | html | |
| AbstractCharClass.LazyNonSpace | code | html | |
| AbstractCharClass.LazyNonWord | code | html | |
| AbstractCharClass.LazyPrint | code | html | |
| AbstractCharClass.LazyPunct | code | html | |
| AbstractCharClass.LazyRange | code | html | |
| AbstractCharClass.LazySpace | code | html | |
| AbstractCharClass.LazySpecialsBlock | code | html | |
| AbstractCharClass.LazyUpper | code | html | |
| AbstractCharClass.LazyWord | code | html | |
| AbstractCharClass.LazyXDigit | code | html | |
| AbstractCharClass.PredefinedCharacterClasses | character classes generated from http://www.unicode.org/reports/tr18/ http://www.unicode.org/Public/4.1.0/ucd/Blocks.txt | code | html |
| AheadFSet | LookAhead FSet, always returns true; | code | html |
| AltGroupQuantifierSet | Represents "?" quantifier over composite sets. | code | html |
| AltQuantifierSet | Represents "?" quantifier over leaf sets. | code | html |
| AtomicFSet | code | html | |
| AtomicJointSet | This class represent atomic group (?>X), once X matches, this match become unchangeable till the end of the match. | code | html |
| BackReferenceSet | Back reference node, i.e. \1-9; | code | html |
| BackReferencedSingleSet | Group node over subexpression w/o alternations. | code | html |
| BehindFSet | FSet for lookbehind constructs. | code | html |
| CIBackReferenceSet | Case Insensitive back reference node; | code | html |
| CICharSet | Represents node accepting single character in case insensitive manner. | code | html |
| CIDecomposedCharSet | Represents case insensitive canonical decomposition of Unicode character. | code | html |
| CISequenceSet | This class represents ASCII case insensitive character sequences. | code | html |
| CanClasses | This class gives us a hashtable that contains canonical classes that are generated from http://www.unicode.org/Public/4.0-Update/UnicodeData-4.0.0.txt. | code | html |
| CharClass | User defined character classes ([abef]). | code | html |
| CharSet | Represents node accepting single character. | code | html |
| CompositeGroupQuantifierSet | Composite (i.e. {n,m}) quantifier node for groups ("(X){n,m}") | code | html |
| CompositeQuantifierSet | Composite (i.e. {n,m}) quantifier node over the leaf nodes ("a{n,m}") | code | html |
| CompositeRangeSet | This class is used to split the range that contains surrogate characters into two ranges: the first consisting of these surrogate characters and the second consisting of all others characters from the parent range. | code | html |
| DecomposedCharSet | Represents canonical decomposition of Unicode character. | code | html |
| DotAllQuantifierSet | Special node for ".*" construction for any character including line terminators. | code | html |
| DotAllSet | Node accepting any character including line terminators. | code | html |
| DotQuantifierSet | Special node for ".*" construction. | code | html |
| DotSet | Node accepting any character except line terminators; | code | html |
| EOISet | Represents end of input '\z', i.e. matches only character after the last one; | code | html |
| EOLSet | Represents node accepting single character. | code | html |
| EmptySet | Valid constant zero character match. | code | html |
| FSet | The node which marks end of the particular group. | code | html |
| FSet.PossessiveFSet | Marks the end of the particular group and not take into account possible kickbacks(required for atomic groups, for instance) | code | html |
| FinalSet | Special construction which marks end of pattern. | code | html |
| GroupQuantifierSet | Default quantifier over groups, in fact this type of quantifier is generally used for constructions we cant identify number of characters they consume. | code | html |
| HangulDecomposedCharSet | Represents canonical decomposition of Hangul syllable. | code | html |
| HashDecompositions | This class gives us a hashtable that contains canonical decomposition mappings that are generated from http://www.unicode.org/Public/4.0-Update/UnicodeData-4.0.0.txt. | code | html |
| HighSurrogateCharSet | This class represents high surrogate character. | code | html |
| I18n | Internationalization stub. | code | html |
| IntArrHash | Hashtable implementation for int arrays. | code | html |
| IntHash | Hashtable implementation for int values. | code | html |
| JointSet | Represents group, which is alternation of other subexpression. | code | html |
| LeafQuantifierSet | code | html | |
| Lexer | The purpose of this class is to break given pattern into RE tokens; | code | html |
| LowHighSurrogateRangeSet | code | html | |
| LowSurrogateCharSet | This class represents low surrogate character. | code | html |
| MatchResultImpl | Match result implementation Note: probably it might make sense to combine this class with Matcher. | code | html |
| Matcher | An engine that performs match operations on a character sequence by interpreting a Pattern . |
code | html |
| MultiLineEOLSet | Represents multiline version of the dollar sign. | code | html |
| MultiLineSOLSet | Multiline version of the ^ sign. | code | html |
| NegativeLookAhead | Negative look ahead node. | code | html |
| NegativeLookBehind | Negative look behind node. | code | html |
| NonCapFSet | Non-capturing group closing node. | code | html |
| NonCapJointSet | Node representing non-capturing group | code | html |
| Pattern | A compiled representation of a regular expression. | code | html |
| Pattern.All | Implements the Unicode category ALL and the dot metacharacter when in dotall mode. | code | html |
| Pattern.BackRef | Refers to a group in the regular expression. | code | html |
| Pattern.Begin | Node to anchor at the beginning of input. | code | html |
| Pattern.Behind | Zero width positive lookbehind. | code | html |
| Pattern.BehindS | Zero width positive lookbehind, including supplementary characters or unpaired surrogates. | code | html |
| Pattern.BitClass | Creates a bit vector for matching Latin-1 values. | code | html |
| Pattern.Block | Node class that matches a Unicode block. | code | html |
| Pattern.BnM | Attempts to match a slice in the input using the Boyer-Moore string matching algorithm. | code | html |
| Pattern.BnMS | Supplementary support version of BnM(). | code | html |
| Pattern.Bound | Handles word boundaries. | code | html |
| Pattern.Branch | Handles the branching of alternations. | code | html |
| Pattern.BranchConn | A Guard node at the end of each atom node in a Branch. | code | html |
| Pattern.CIBackRef | code | html | |
| Pattern.Caret | Node to anchor at the beginning of a line. | code | html |
| Pattern.Category | Node class that matches a Unicode category. | code | html |
| Pattern.CharPropertyNames | code | html | |
| Pattern.Conditional | code | html | |
| Pattern.Ctype | Node class that matches a POSIX type. | code | html |
| Pattern.Curly | Handles the curly-brace style repetition with a specified minimum and maximum occurrences. | code | html |
| Pattern.Dollar | Node to anchor at the end of a line or the end of input based on the multiline mode. | code | html |
| Pattern.Dot | Node class for the dot metacharacter when dotall is not enabled. | code | html |
| Pattern.End | Node to anchor at the end of input. | code | html |
| Pattern.First | Searches until the next instance of its atom. | code | html |
| Pattern.GroupCurly | Handles the curly-brace style repetition with a specified minimum and maximum occurrences in deterministic cases. | code | html |
| Pattern.GroupHead | The GroupHead saves the location where the group begins in the locals and restores them when the match is done. | code | html |
| Pattern.GroupRef | Recursive reference to a group in the regular expression. | code | html |
| Pattern.GroupTail | The GroupTail handles the setting of group beginning and ending locations when groups are successfully matched. | code | html |
| Pattern.LastMatch | Node to match the location where the last match ended. | code | html |
| Pattern.LastNode | code | html | |
| Pattern.LazyLoop | Handles the repetition count for a reluctant Curly. | code | html |
| Pattern.Loop | Handles the repetition count for a greedy Curly. | code | html |
| Pattern.Neg | Zero width negative lookahead. | code | html |
| Pattern.Node | Base class for all node classes. | code | html |
| Pattern.NotBehind | Zero width negative lookbehind. | code | html |
| Pattern.NotBehindS | Zero width negative lookbehind, including supplementary characters or unpaired surrogates. | code | html |
| Pattern.Pos | Zero width positive lookahead. | code | html |
| Pattern.Prolog | This sets up a loop to handle a recursive quantifier structure. | code | html |
| Pattern.Ques | The 0 or 1 quantifier. This one class implements all three types. | code | html |
| Pattern.Script | Node class that matches a Unicode script | code | html |
| Pattern.Single | Optimization -- matches a given BMP character | code | html |
| Pattern.SingleI | Case insensitive matches a given BMP character | code | html |
| Pattern.SingleS | Node class that matches a Supplementary Unicode character | code | html |
| Pattern.SingleU | Unicode case insensitive matches a given Unicode character | code | html |
| Pattern.Slice | Node class for a case sensitive/BMP-only sequence of literal characters. | code | html |
| Pattern.SliceI | Node class for a case_insensitive/BMP-only sequence of literal characters. | code | html |
| Pattern.SliceIS | Node class for a case insensitive sequence of literal characters including supplementary characters. | code | html |
| Pattern.SliceNode | Base class for all Slice nodes | code | html |
| Pattern.SliceS | Node class for a case sensitive sequence of literal characters including supplementary characters. | code | html |
| Pattern.SliceU | Node class for a unicode_case_insensitive/BMP-only sequence of literal characters. | code | html |
| Pattern.SliceUS | Node class for a case insensitive sequence of literal characters. | code | html |
| Pattern.Start | Used for REs that can start anywhere within the input string. | code | html |
| Pattern.StartS | code | html | |
| Pattern.TreeInfo | Used to accumulate information about a subtree of the object graph so that optimizations can be applied to the subtree. | code | html |
| Pattern.UnixCaret | Node to anchor at the beginning of a line when in unixdot mode. | code | html |
| Pattern.UnixDollar | Node to anchor at the end of a line or the end of input based on the multiline mode when in unix lines mode. | code | html |
| Pattern.UnixDot | Node class for the dot metacharacter when dotall is not enabled but UNIX_LINES is enabled. | code | html |
| Pattern.Utype | Node class that matches a Unicode "type" | code | html |
| PatternSyntaxException | Unchecked exception thrown to indicate a syntax error in a regular-expression pattern. | code | html |
| PosAltGroupQuantifierSet | Possessive quantifier over group, see java.util.regex.GroupQuantifierSet for more details. | code | html |
| PosCompositeGroupQuantifierSet | Possessive composite (i.e. {n,m}) quantifier node over groups. | code | html |
| PosPlusGroupQuantifierSet | Possessive + quantifier node over groups. | code | html |
| PositiveLookAhead | Positive lookahead node. | code | html |
| PositiveLookBehind | Positive lookbehind node. | code | html |
| PossessiveAltQuantifierSet | Possessive ? quantifier node. | code | html |
| PossessiveCompositeQuantifierSet | Possessive composite (i.e. {n, m}) quantifier node. | code | html |
| PossessiveGroupQuantifierSet | Possessive quantifier set over groups. | code | html |
| PossessiveQuantifierSet | Possessive quantifier set over LeafSet's | code | html |
| PreviousMatch | Node representing previous match (\G). | code | html |
| Quantifier | Represents RE quantifier; contains two fields responsible for min and max number of repetitions. | code | html |
| RangeSet | Represents node accepting single character from the given char class. | code | html |
| RelAltGroupQuantifierSet | Reluctant version of "?" quantifier set over group. | code | html |
| RelCompositeGroupQuantifierSet | Reluctant version of composite (i.e. | code | html |
| ReluctantAltQuantifierSet | This class represents ?? quantifier over leaf sets. | code | html |
| ReluctantCompositeQuantifierSet | Reluctant version of composite(i.e. {n,m}) quantifier set over leaf nodes. | code | html |
| ReluctantGroupQuantifierSet | Relactant version of the group quantifier set. | code | html |
| ReluctantQuantifierSet | This class represents [+*]? constructs over LeafSets. | code | html |
| SOLSet | Represents node accepting single character. | code | html |
| SequenceSet | This class represents nodes constructed with character sequences. | code | html |
| SequenceSet.IntHash | code | html | |
| SingleDecompositions | This class gives us a hashtable that contains information about symbols that are one symbol decompositions that is generated from http://www.unicode.org/Public/4.0-Update/UnicodeData-4.0.0.txt. | code | html |
| SingleSet | Group node over subexpression w/o alternations. | code | html |
| SupplCharSet | Represents node accepting single supplementary codepoint. | code | html |
| SupplRangeSet | Represents node accepting single character from the given char class. | code | html |
| UCIBackReferenceSet | Unicode case insensitive back reference (i.e. \1-9) node. | code | html |
| UCICharSet | Represents node accepting single character in unicode case insensitive manner. | code | html |
| UCIDecomposedCharSet | Represents Unicode case insensitive canonical decomposition of Unicode character. | code | html |
| UCIRangeSet | Represents node accepting single character from the given char class. | code | html |
| UCISequenceSet | Node accepting substrings in unicode case insensitive manner. | code | html |
| UCISupplCharSet | Represents node accepting single supplementary codepoint in Unicode case insensitive manner. | code | html |
| UCISupplRangeSet | Represents node accepting single character from the given char class in Unicode case insensitive manner. | code | html |
| UEOLSet | Unix line terminator, accepting only \n. | code | html |
| UMultiLineEOLSet | Unix style multiline end-of-line node. | code | html |
| UnicodeCategory | Unicode category (i.e. Ll, Lu). | code | html |
| UnicodeCategoryScope | Unicode category scope (i.e IsL, IsM, ...) | code | html |
| UnifiedQuantifierSet | Greedy quantifier node for the case where there is no intersection with next node and normal quantifiers could be treated as greedy and possessive. | code | html |
| WordBoundary | Represents word boundary, checks current character and previous one if different types returns true; | code | html |
An instance of the java.util.regex.Pattern class represents a regular expression that is specified in string form in a syntax similar to that used by Perl.
Instances of the java.util.regex.Matcher class are used to match character sequences against a given pattern. Input is provided to matchers via the java.lang.CharSequence interface in order to support matching against characters from a wide variety of input sources.
Unless otherwise noted, passing a null argument to a method in any class or interface in this package will cause a NullPointerException to be thrown.
An excellent tutorial and overview of regular expressions is Mastering Regular Expressions, Jeffrey E. F. Friedl, O'Reilly and Associates, 1997.
@since 1.4 @author Mike McCloskey @author Mark Reinhold