.. This file is part of Sympathy for Data. .. Copyright (c) 2023 Combine Control Systems AB .. .. Sympathy for Data is free software: you can redistribute it and/or modify .. it under the terms of the GNU General Public License as published by .. the Free Software Foundation, version 3 of the License. .. .. Sympathy for Data is distributed in the hope that it will be useful, .. but WITHOUT ANY WARRANTY; without even the implied warranty of .. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the .. GNU General Public License for more details. .. .. You should have received a copy of the GNU General Public License .. along with Sympathy for Data. If not, see . .. _appendix_regex: Regular expression syntax ========================= Regular expressions (regex) are a flexible and powerful tool for pattern matching and substitution in text. Some nodes allow you to "search" or "search and replace" using regex. Knowing the regex syntax basics enables you to write concise expressions for complex patterns, taking full advantage of these nodes. In Sympathy, regular expressions are based on python regular expression operations module ``re``, and its documentation is available at https://docs.python.org/3/library/re.html. For a more general description of regular expressions, have a look at the Wikipedia page https://en.wikipedia.org/wiki/Regular_expression. Patterns -------- A regular expression uses a string to specify a pattern. It can contain both ordinary and special characters. Ordinary characters match themselves literally, e.g. ``articles`` matches the string ``articles``. These characters do not carry any special meaning and are treated as plain text. Most characters are ordinary in regex, including the majority of letters, digits, punctuation marks and symbols. Special characters, in contrast, carry special meanings. They can either be used to represent classes of ordinary characters, or to change the meaning of characters around them. For example, ``.`` represents the class of any single character except a newline, and thus you can use ``.`` to match an arbitrary character. Special characters open up abundant text matching and manipulation capabilities in regex. Special characters ------------------ The following categorized special characters are commonly used. Metacharacters ^^^^^^^^^^^^^^ You can match common character classes and patterns in shorthand using metacharacters. ================= ======================================== Special character Matches ================= ======================================== ``.`` (dot) Any character except a newline (``\n``) ``|`` (pipe) Alternate, ``a|b`` matches either a or b ``\d`` Any digit ``\D`` Any non-digit ``\s`` Any whitespace character ``\S`` Any non-whitespace character ================= ======================================== Quantifiers ^^^^^^^^^^^ Quantifiers specify the number of repetitions of a preceding pattern. ================= ======================================== Special character Matches ================= ======================================== ``a?`` Zero or one of a ``a*`` Zero or more of a ``a+`` One or more of a ``a{m}`` Exactly m copies of a. ``a{3}`` matches aaa. ``a{m,}`` m or more copies of a. ``a{3,}`` matches 3 or more a. ``a{,n}`` Up to n copies of a. ``a{,3}`` matches up to 3 a. ``a{m,n}`` Between m and n copies of a. ``a{3,5}`` matches from 3 to 5 a. ================= ======================================== Anchors ^^^^^^^ Anchors allow you to specify what a string should start and/or end with. A pattern like ``^art.*e$`` means matching strings that starts with ``art``, followed by zero or more of any character(s) (except the newline character), and ends with ``e``. The string ``article`` is a match. ================= ======================================== Special character Matches ================= ======================================== ``^`` (caret) Start of string ``$`` (dollar) End of string ================= ======================================== Character classes ^^^^^^^^^^^^^^^^^ A character classes specifies a range or a group of characters to match. ================= ======================================== Special character Matches ================= ======================================== ``[abc]`` A character of a, b or c ``[^abc]`` A character except a, b or c ``[a-z]`` A character in the range a-z ``[a-zA-Z]`` A character in the range a-z or A-Z ================= ======================================== The above sections are a subset of special characters in Python regular expression syntax: https://docs.python.org/3/library/re.html#regular-expression-syntax Special characters can be matched literally using backslash ``\``, e.g. use regex ``\.`` to match a literal ``.``. .. _regex_grouping_and_capturing: Grouping and capturing ---------------------- Grouping in regular expressions allows you to group whatever expressions inside the parentheses together and treat them as a single unit. ``(`` and ``)`` indicates the start and end of a group. For example, the pattern ``(abc)+`` matches one or more copies of ``abc``. After a match, the contents of a group can be captured and reused in the replace pattern. To reuse the part of the match within the parentheses, use ``\1`` (or higher numbers) to insert matches. As an example let's say that you have a string ``x_old``, and you intend to replace ``old`` with ``new`` but keep the rest as it was. If you enter the search expression ``(.*)_old`` and the replace expression ``\1_new``, the output will be ``x_new``. This will come in handy when e.g. rename all columns that shares the same pattern in a table in the node :ref:`org.sysess.sympathy.data.table.renamesingletablecolumns`. If the input table contains columns ``x_old``, ``y_old`` and ``z_old``, entering the search and replace expressions above in ``Regex`` mode will rename these columns to ``x_new``, ``y_new`` and ``z_new``. Sandbox ------- To try out expression with instant feedback, play around with the node :ref:`org.sysess.sympathy.examples.regex`.