Class representing a regular expression.
It has a similar API as std::regex
, but is much faster (and also somewhat more limited).
在文件 regex.h 第 48 行定义.
Creates a regular expression object given the pattern as a string.
Two modes of matching are supported: RegEx and Wildcard
The following special characters are supported in Mode::RegEx mode.
c
matches character c
.
matches any character
^
matches the start of the input
$
matches the end of the input
\<
matches the start of a word
\>
matches the end of a word
[]
matches a set of characters
x*
matches a sequence of zero or more x
's
x+
matches a sequence of one or more x
's
x?
matches an optional x
(
matches the start of a capture range
)
matches the ends a capture range
\c
to escape a special character, such as +
, [
, *
, (
, etc.
\t
matches a tab character
\n
matches a newline character
\r
matches a return character
\s
matches any whitespace as defined by std::isspace()
\d
matches any digit as defined by std::digit()
\a
matches any alphabetical characters, same as [a-z_A-Z\x80-\xFF]
\w
matches any alpha numercial character, same as [a-z_A-Z0-9\x80-\xFF]
\xHH
matches a hexadecimal character, e.g. \xA0
matches character code 160.
A character range can be used to match a character that falls inside a range (or set of ranges). Within the opening [
and closing ]
brackets of a character ranges the following is supported:
^
if at the start of the range, a character matches if it is not in the range, e.g. [^\d]
matches any character not a digit
-
when placed between 2 characters it defines a range from the first character to the second. any character that falls in the range will match, e.g. [0-9] matches the digit from 0 to 9.
\s
, \d
, \a
, and \w
as explained above.
- 注解
- that special characters
.
, *
, ?
, $
, +
, [
do not have a special meaning in a character range. ^
only has a special meaning as the first character.
-
that capture ranges cannot be nested, and
*
, +
, and ?
do not work on capture ranges. e.g. (abd)?
is not valid. If multiple capture ranges are specified then some character has to be in between them, e.g. this does not work (.*)(a.*)
, but this does (.*)a(.*)
.
In Wildcard mode *
is used to match any sequence of zero or more characters. The character ?
can be used to match an optional character. Character ranges are also supported, but other characters like $
and +
are just treated as literal characters.
在文件 regex.cpp 第 662 行定义.
引用了 p.
bool reg::Ex::match |
( |
const std::string & |
str, |
|
|
Match & |
match, |
|
|
size_t |
pos = 0 |
|
) |
| const |
Check if a given string matches this regular expression.
- 参数
-
str | The input string to match against. |
match | The match object to hold the matching results. |
pos | The position in the string at which to start the match. |
- 返回
- true iff a match is found. Details are stored in the match object.
在文件 regex.cpp 第 676 行定义.
679 if (
p->data.size()==0 ||
p->error)
return found;
682 PToken tok =
p->data[0];
685 found =
p->matchAt(0,str,
match,pos,0);
691 size_t index = str.find(tok.asciiValue(),pos);
692 if (index==std::string::npos)
694 DBG(
"Ex::match(str='%s',pos=%zu)=false (no start char '%c')\n",str.c_str(),pos,tok.asciiValue());
697 DBG(
"pos=%zu str='%s' char='%c' index=%zu\n",index,str.c_str(),tok.asciiValue(),index);
700 while (pos<str.length())
702 found =
p->matchAt(0,str,
match,pos,0);
707 DBG(
"Ex::match(str='%s',pos=%zu)=%d\n",str.c_str(),pos,found);
引用了 reg::PToken::asciiValue(), reg::PToken::BeginOfLine, reg::PToken::Character, DBG, reg::PToken::kind() , 以及 p.
被这些函数引用 reg::match(), reg::Ex::Private::matchAt(), reg::Iterator::operator!=(), reg::replace() , 以及 reg::search().