Regular expression syntax

Regular expressions (regex) are a flexible and powerful tool for pattern matching and substitution in text. Some nodes allow you to “search” or “search and replace” using regex. Knowing the regex syntax basics enables you to write concise expressions for complex patterns, taking full advantage of these nodes.

In Sympathy, regular expressions are based on python regular expression operations module re, and its documentation is available at https://docs.python.org/3/library/re.html. For a more general description of regular expressions, have a look at the Wikipedia page https://en.wikipedia.org/wiki/Regular_expression.

Patterns

A regular expression uses a string to specify a pattern. It can contain both ordinary and special characters. Ordinary characters match themselves literally, e.g. articles matches the string articles. These characters do not carry any special meaning and are treated as plain text. Most characters are ordinary in regex, including the majority of letters, digits, punctuation marks and symbols.

Special characters, in contrast, carry special meanings. They can either be used to represent classes of ordinary characters, or to change the meaning of characters around them. For example, . represents the class of any single character except a newline, and thus you can use . to match an arbitrary character. Special characters open up abundant text matching and manipulation capabilities in regex.

Special characters

The following categorized special characters are commonly used.

Metacharacters

You can match common character classes and patterns in shorthand using metacharacters.

Special character

Matches

. (dot)

Any character except a newline (\n)

| (pipe)

Alternate, a|b matches either a or b

\d

Any digit

\D

Any non-digit

\s

Any whitespace character

\S

Any non-whitespace character

Quantifiers

Quantifiers specify the number of repetitions of a preceding pattern.

Special character

Matches

a?

Zero or one of a

a*

Zero or more of a

a+

One or more of a

a{m}

Exactly m copies of a. a{3} matches aaa.

a{m,}

m or more copies of a. a{3,} matches 3 or more a.

a{,n}

Up to n copies of a. a{,3} matches up to 3 a.

a{m,n}

Between m and n copies of a. a{3,5} matches from 3 to 5 a.

Anchors

Anchors allow you to specify what a string should start and/or end with. A pattern like ^art.*e$ means matching strings that starts with art, followed by zero or more of any character(s) (except the newline character), and ends with e. The string article is a match.

Special character

Matches

^ (caret)

Start of string

$ (dollar)

End of string

Character classes

A character classes specifies a range or a group of characters to match.

Special character

Matches

[abc]

A character of a, b or c

[^abc]

A character except a, b or c

[a-z]

A character in the range a-z

[a-zA-Z]

A character in the range a-z or A-Z

The above sections are a subset of special characters in Python regular expression syntax: https://docs.python.org/3/library/re.html#regular-expression-syntax

Special characters can be matched literally using backslash \, e.g. use regex \. to match a literal ..

Grouping and capturing

Grouping in regular expressions allows you to group whatever expressions inside the parentheses together and treat them as a single unit. ( and ) indicates the start and end of a group. For example, the pattern (abc)+ matches one or more copies of abc.

After a match, the contents of a group can be captured and reused in the replace pattern. To reuse the part of the match within the parentheses, use \1 (or higher numbers) to insert matches.

As an example let’s say that you have a string x_old, and you intend to replace old with new but keep the rest as it was. If you enter the search expression (.*)_old and the replace expression \1_new, the output will be x_new. This will come in handy when e.g. rename all columns that shares the same pattern in a table in the node Rename columns in Table. If the input table contains columns x_old, y_old and z_old, entering the search and replace expressions above in Regex mode will rename these columns to x_new, y_new and z_new.

Sandbox

To try out expression with instant feedback, play around with the node Regex example.