Regular expression syntax¶
Regular expressions (regex) are a flexible and powerful tool for pattern matching and substitution in text. Some nodes allow you to “search” or “search and replace” using regex. Knowing the regex syntax basics enables you to write concise expressions for complex patterns, taking full advantage of these nodes.
In Sympathy, regular expressions are based on python regular expression
operations module re
, and its documentation is available at
https://docs.python.org/3/library/re.html. For a more general description of
regular expressions, have a look at the Wikipedia page
https://en.wikipedia.org/wiki/Regular_expression.
Patterns¶
A regular expression uses a string to specify a pattern. It can contain both
ordinary and special characters. Ordinary characters match themselves
literally, e.g. articles
matches the string articles
. These characters
do not carry any special meaning and are treated as plain text. Most characters
are ordinary in regex, including the majority of letters, digits, punctuation
marks and symbols.
Special characters, in contrast, carry special meanings. They can either be
used to represent classes of ordinary characters, or to change the meaning of
characters around them. For example, .
represents the class of any single
character except a newline, and thus you can use .
to match an arbitrary
character. Special characters open up abundant text matching and manipulation
capabilities in regex.
Special characters¶
The following categorized special characters are commonly used.
Metacharacters¶
You can match common character classes and patterns in shorthand using metacharacters.
Special character |
Matches |
---|---|
|
Any character except a newline ( |
|
Alternate, |
|
Any digit |
|
Any non-digit |
|
Any whitespace character |
|
Any non-whitespace character |
Quantifiers¶
Quantifiers specify the number of repetitions of a preceding pattern.
Special character |
Matches |
---|---|
|
Zero or one of a |
|
Zero or more of a |
|
One or more of a |
|
Exactly m copies of a. |
|
m or more copies of a. |
|
Up to n copies of a. |
|
Between m and n copies of a. |
Anchors¶
Anchors allow you to specify what a string should start and/or end with. A
pattern like ^art.*e$
means matching strings that starts with art
,
followed by zero or more of any character(s) (except the newline character),
and ends with e
. The string article
is a match.
Special character |
Matches |
---|---|
|
Start of string |
|
End of string |
Character classes¶
A character classes specifies a range or a group of characters to match.
Special character |
Matches |
---|---|
|
A character of a, b or c |
|
A character except a, b or c |
|
A character in the range a-z |
|
A character in the range a-z or A-Z |
The above sections are a subset of special characters in Python regular expression syntax: https://docs.python.org/3/library/re.html#regular-expression-syntax
Special characters can be matched literally using backslash \
, e.g. use
regex \.
to match a literal .
.
Grouping and capturing¶
Grouping in regular expressions allows you to group whatever expressions inside
the parentheses together and treat them as a single unit. (
and )
indicates the start and end of a group. For example, the pattern (abc)+
matches one or more copies of abc
.
After a match, the contents of a group can be captured and reused in the
replace pattern. To reuse the part of the match within the
parentheses, use \1
(or higher numbers) to insert matches.
As an example let’s say that you have a string x_old
, and you intend to
replace old
with new
but keep the rest as it was. If you enter the
search expression (.*)_old
and the replace expression \1_new
, the
output will be x_new
. This will come in handy when e.g. rename all columns
that shares the same pattern in a table in the node
Rename columns in Table. If the input
table contains columns x_old
, y_old
and z_old
, entering the search
and replace expressions above in Regex
mode will rename these columns to
x_new
, y_new
and z_new
.
Sandbox¶
To try out expression with instant feedback, play around with the node Regex example.