RegExMatch

Determines whether a string contains a pattern (regular expression).

FoundPos := RegExMatch(Haystack, NeedleRegEx , OutputVar, StartingPosition := 1)

Parameters

Haystack

Type: String

The string whose content is searched. This may contain binary zero.

NeedleRegEx

Type: String

The pattern to search for, which is a Perl-compatible regular expression (PCRE). The pattern's options (if any) must be included at the beginning of the string followed by a close-parenthesis. For example, the pattern "i)abc.*123" would turn on the case-insensitive option and search for "abc", followed by zero or more occurrences of any character, followed by "123". If there are no options, the ")" is optional; for example, ")abc" is equivalent to "abc".

Although NeedleRegEx cannot contain binary zero, the pattern \x00 can be used to match a binary zero within Haystack.

OutputVar

Type: Variable

Specify a variable in which to store a match object, which can be used to retrieve the position, length and value of the overall match and of each captured subpattern, if any are present.

If the pattern is not found (that is, if the function returns 0), this variable is made blank.

StartingPosition

Type: Integer

If StartingPosition is omitted, it defaults to 1 (the beginning of Haystack). Otherwise, specify 2 to start at the second character, 3 to start at the third, and so on. If StartingPosition is beyond the length of Haystack, the search starts at the empty string that lies at the end of Haystack (which typically results in no match).

Specify a negative StartingPosition to start at that position from the right. For example, -1 starts at the last character and -2 starts at the next-to-last character. If StartingPosition tries to go beyond the left end of Haystack, all of Haystack is searched.

Regardless of the value of StartingPosition, the return value is always relative to the first character of Haystack. For example, the position of "abc" in "123abc789" is always 4.

Return Value

Type: Integer

This function returns the position of the leftmost occurrence of NeedleRegEx in the string Haystack. Position 1 is the first character. Zero is returned if the pattern is not found.

Errors

Syntax errors: If the pattern contains a syntax error, an exception is thrown with a message in the following form: Compile error N at offset M: description. In that string, N is the PCRE error number, M is the position of the offending character inside the regular expression, and description is the text describing the error.

Execution errors: If an error occurs during the execution of the regular expression, an exception is thrown. The Extra property of the exception object contains the PCRE error number. Although such errors are rare, the ones most likely to occur are "too many possible empty-string matches" (-22), "recursion too deep" (-21), and "reached match limit" (-8). If these happen, try to redesign the pattern to be more restrictive, such as replacing each * with a ?, +, or a limit like {0,3} wherever feasible.

Options

See Options for modifiers such as "i)abc", which turns off case-sensitivity in the pattern "abc".

Match Object

If a match is found, an object containing information about the match is stored in OutputVar. This object has the following properties:

Match.Pos(N): Returns the position of the overall match or a captured subpattern.

Match.Len(N): Returns the length of the overall match or a captured subpattern.

Match.Value(N): Returns the overall match or a captured subpattern.

Match.Name(N): Returns the name of the given subpattern, if it has one.

Match.Count(): Returns the overall number of subpatterns.

Match.Mark(): Returns the NAME of the last encountered (*MARK:NAME), when applicable.

Match[N]: If N is 0 or a valid subpattern number or name, this is equivalent to Match.Value(N).

Match.N: Same as above, except that N is an unquoted name or number.

For all of the above properties, N can be any of the following:

The object also supports enumeration; that is, the for-loop is supported. Alternatively, use Loop Match.Count().

Performance

To search for a simple substring inside a larger string, use InStr because it is faster than RegExMatch.

To improve performance, the 100 most recently used regular expressions are kept cached in memory (in compiled form).

The study option (S) can sometimes improve the performance of a regular expression that is used many times (such as in a loop).

Remarks

A subpattern may be given a name such as the word Year in the pattern "(?P<Year>\d{4})". Such names may consist of up to 32 alphanumeric characters and underscores. Note that named subpatterns are also numbered, so if an unnamed subpattern occurs after "Year", it would be stored in OutputVar[2], not OutputVar[1].

Most characters like abc123 can be used literally inside a regular expression. However, the characters \.*?+[{|()^$ must be preceded by a backslash to be seen as literal. For example, \. is a literal period and \\ is a literal backslash. Escaping can be avoided by using \Q...\E. For example: \QLiteral Text\E.

Within a regular expression, special characters such as tab and newline can be escaped with either an accent (`) or a backslash (\). For example, `t is the same as \t except when the x option is used.

To learn the basics of regular expressions (or refresh your memory of pattern syntax), see the RegEx Quick Reference.

AutoHotkey's regular expressions are implemented using Perl-compatible Regular Expressions (PCRE) from www.pcre.org.

Within an expression, the a ~= b can be used as shorthand for RegExMatch(a, b).

Related

RegExReplace, RegEx Quick Reference, Regular Expression Callouts, InStr, SubStr, SetTitleMatchMode RegEx, Global matching and Grep (forum link)

Common sources of text data: FileRead, Download, A_Clipboard, GUI Edit controls

Examples

For general RegEx examples, see the RegEx Quick Reference.

#1

MsgBox RegExMatch("xxxabc123xyz", "abc.*xyz")  ; Shows 4, which is the position where the match was found.
MsgBox RegExMatch("abc123123", "123$")  ; Shows 7 because the $ requires the match to be at the end.
MsgBox RegExMatch("abc123", "i)^ABC")  ; Shows 1 because a match was achieved via the case-insensitive option.
MsgBox RegExMatch("abcXYZ123", "abc(.*)123", SubPat)  ; Shows 1 and stores "XYZ" in SubPat[1].
MsgBox RegExMatch("abc123abc456", "abc\d+",, 2)  ; Shows 7 instead of 1 due to StartingPosition 2 vs. 1.

#2: Match object

FoundPos := RegExMatch("Michiganroad 72", "(.*) (?<nr>\d+)", SubPat)
MsgBox SubPat.Count() ": " SubPat.Value(1) " " SubPat.Name(2) "=" SubPat["nr"]  ; Displays "2: Michiganroad nr=72"

#3: A simple example which retrieves the extension of a file. Note that SplitPath can also be used for this, which is more reliable.

Path := "C:\Foo\Bar\Baz.txt"
RegExMatch(Path, "\w+$", Extension)
MsgBox Extension[0]  ; Shows "txt".