.. This file is part of Sympathy for Data.
.. Copyright (c) 2020 Combine Control Systems AB
..
.. Sympathy for Data is free software: you can redistribute it and/or modify
.. it under the terms of the GNU General Public License as published by
.. the Free Software Foundation, version 3 of the License.
..
.. Sympathy for Data is distributed in the hope that it will be useful,
.. but WITHOUT ANY WARRANTY; without even the implied warranty of
.. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
.. GNU General Public License for more details.
..
.. You should have received a copy of the GNU General Public License
.. along with Sympathy for Data. If not, see .
Appendix
========
.. _appendix_encoding_text:
Encoding
--------
Some Sympathy nodes allows you to choose an encoding, especially ones that write
or read from files or communicate over a network. This section is a short
introduction about encodings to help you choose.
Character encoding determines the translation between text characters and bytes,
for example, stored in a file. Each encoding uses a different translation scheme
and can support different languages.
- Encode: text characters -> bytes
- Decode: bytes -> text characters
Choose the same encoding for decode as was originally used to encode the
data.
See https://en.wikipedia.org/wiki/Character_encoding for more information.
Notable Encodings
^^^^^^^^^^^^^^^^^
Here are some encodings that we offer as choices in Sympathy.
- `UTF-8 `_, supports essentially all
written languages. Widely used on the web and **strongly recommended** when you
have the freedom to choose. Capable of encoding all valid unicode characters.
- `UTF-16 `_, supports essentially all
written languages. There are variations depending on `byte order (endianness)
`_ UTF-16-LE, UTF-16-BE, and UTF-16
which uses `byte order mark (BOM)
`_ to determine to use LE or
BE. UTF-16, which is used internally Microsoft Windows is, now, generally
superceeded by UTF-8. Capable of encoding all valid unicode characters. **Use
only when required!**
Older encodings:
These are not recommended but could be needed when working with existing
files and applications. **Use only when required!**.
- `US-ASCII `_, supports American English.
- `ISO 8859-1 (Latin-1) `_, supports
Western European Languages, superset of US-ASCII.
- `ISO 8859-15 (Latin-9) `_,
supports Western European Languages, similar to ISO 8859-1 but replaces some
less common symbols, introducing the euro sign.
- `Windows-1252 `_, supports Western
European Languages, superset of ISO 8859-1 in terms of printable
characters. Used in the legacy components of Microsoft Windows for English and
many European languages.
For other encodings (if you type the name by hand), use the `Codec` names from
https://docs.python.org/3/library/codecs.html#standard-encodings.
Windows code pages
^^^^^^^^^^^^^^^^^^
Older applications and file formats, especially ones for Windows, sometimes use
`Windows code page `_
encodings. These can be identified with a `code page identifiers
`_. Python
supports a subset of these, but use different identifiers. Many are prefixed by
cp, for example cp-1252 (same as windows-1252 mentioned earlier). These are superseded
by unicode and UTF-8, etc. but can still be found in files today.
Mojibake
^^^^^^^^
Garbled text resulting from decode using an unintended character encoding,
making characters appears as unrelated ones.
Swedish example:
Björnbärssnår (Blackberry thicket)
======== ========== =================
Encode Decode Result
======== ========== =================
UTF-8 UTF-8 Björnbärssnår
UTF-8 ISO-8859-1 Björnbärssnår
UTF-8 ISO-8859-2 BjÜrnbärssnür
UTF-8 UTF-16-LE 橂뛃湲썢犤獳썮犥
UTF-8 UTF-16-BE 䉪쎶牮拃ꑲ獳滃ꕲ
======== ========== =================
As seen, unmatched encoding can result in anything from
misrepresented special characters to a result that is compeletely off. The
result can even be correct for some words. Both encode and decode can also fail
if there is no possible translation, depending on the combination of
characters (encode) or bytes (decode).
See https://en.wikipedia.org/wiki/Mojibake for more information.
Python
^^^^^^
Encodings in Python is performed using 2 different methods: str.encode and
bytes.decode. Names for available encodings can be found in the documentation for the `codecs
`_ module.
Encode using an unsupported encoding results in an UnicodeEncodeError.
.. code-block:: python
>>> 'Björnbärssnår'.encode('ascii')
Traceback (most recent call last):
...
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 2: ordinal not in range(128)
Decode using an unsupported encoding results in UnicodeDecodeError.
.. code-block:: python
>>> encoded = 'Björnbärssnår'.encode('iso-8859-1')
>>> encoded
b'Bj\xf6rnb\xe4rssn\xe5r'
>>> encoded.decode('iso-8859-1')
'Björnbärssnår'
>>> encoded.decode('utf-8')
Traceback (most recent call last):
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 2: invalid start byte
Often, the right way to deal with these exceptions is simply to choose the
intended encoding. When the exact encoding is unknown or if the data is somehow
corrupt, Python offers the `errors` parameter for `encode` and `decode` - which can
substitute or ignore unsupported symbols.
.. code-block:: python
>>> encoded = 'Björnbärssnår'.encode('iso-8859-1')
>>> encoded
b'Bj\xf6rnb\xe4rssn\xe5r'
>>> encoded.decode('iso-8859-1')
'Björnbärssnår'
>>> encoded.decode('utf-8', errors='replace')
'Bj�rnb�rssn�r'
Here, `errors='replace'` substitutes � in place of unhandled characters instead
of raising a UnicodeDecodeError. For more options, see
https://docs.python.org/3/howto/unicode.html.
.. _appendix_typed_text:
Input typed values as text
--------------------------
Some nodes will allow you to input text to use to produce a typed value - which
could depend, for example, on the type of columns used in the operation. The
text needs to use a format that is understood by the functions for reading the
type used.
If the type is text, any input will do, but for other types see the following
examples:
:bool: True, False, true, false, 1, 0
:integer: 0, 1, 2, ...
:float: 0, 0.0, 1, 1.1, ...
:text: Anything goes here!
:datetime: 1970-01-01T00:00:00.000000,
1970-01-01 00:00:00.000000,
1970-01-01 00:00:00.00,
1970-01-01
:timedelta: 1 days,
2 d,
44.333 seconds,
2 days 2 h 44 seconds,
:complex: 1.1 + 2j
.. _appendix_cli:
All command line options
------------------------
Top-level
^^^^^^^^^
``python -m sympathy --help``
.. code-block:: bash
usage: sympathy [-h]
{gui,cli,viewer,install,uninstall,tests,clear,launch}
...
Sympathy for Data
optional arguments:
-h, --help show this help message and exit
--version show Sympathy for Data version and exit
Commands:
{gui,cli,viewer,install,uninstall,tests,clear,launch}
Command
gui run Sympathy in GUI mode
cli run Sympathy in CLI mode
viewer run the viewer for sydata files.
install install Sympathy (start menu, file associations,
documentation)
uninstall uninstall Sympathy (start menu, file associations)
tests run the test suite
clear cleanup temporary files
launch internal use only
Gui and Cli
^^^^^^^^^^^
The options for the gui and cli commands are similar.
``python -m sympathy gui --help``
.. code-block:: bash
usage: __main__.py gui [-h] [--exit-after-exception {0,1}]
[-L LOGGER [LEVEL ...]]
[--num-worker-processes NUM_WORKER_PROCESSES]
[-I INIFILE] [--nocapture]
[filename]
positional arguments:
filename file containing workflow.
optional arguments:
-h, --help show this help message and exit
--exit-after-exception {0,1}
exit after uncaught exception occurs in a signal handler
-L LOGGER [LEVEL ...],
--loglevel LOGGER [LEVEL ...]
A logger configuration with a logger name and a level
(e.g. -L app.stats warning). This argument can be
repeated.
--num-worker-processes NUM_WORKER_PROCESSES
number of python worker processes (0) use system number
of CPUs
-I INIFILE,
--inifile INIFILE
settings ini-file to use instead of the default
--environment-credentials PREFIX
read credential secrets from environment
variables starting with PREFIX that are encoded as
json lists, with json dictionary values e.g,
PREFIX["secret","foo"]={"secret":"bar"}.
--nocapture disable capturing of node output and send it directly to
stdout/stderr.
.. code-block:: bash
usage: launch.py gui [-h] [--exit-after-exception {0,1}] [-v]
[-L {0,1,2,3,4,5}] [-N {0,1,2,3,4,5}]
[--num-worker-processes NUM_WORKER_PROCESSES]
[-C CONFIGFILE [CONFIGFILE ...]] [-I INIFILE]
[--nocapture]
[filename]
positional arguments:
filename file containing workflow.
optional arguments:
-h, --help show this help message and exit
--exit-after-exception {0,1}, --exit_after_exception {0,1}
exit after uncaught exception occurs in a signal
handler
-L {0,1,2,3,4,5}, --loglevel {0,1,2,3,4,5}
(0) disable logging, (5) enable all logging
-N {0,1,2,3,4,5}, --node-loglevel {0,1,2,3,4,5}, --node_loglevel {0,1,2,3,4,5}
(0) disable logging, (5) enable all logging
--num-worker-processes NUM_WORKER_PROCESSES, --num_worker_processes NUM_WORKER_PROCESSES
number of python worker processes (0) use system
number of CPUs
-C CONFIGFILE [CONFIGFILE ...], --configfile CONFIGFILE [CONFIGFILE ...]
workflow configuration file, used to change parameters
and an optional outfile for the modified workflow
-I INIFILE, --inifile INIFILE
settings ini-file to use instead of the default
--environment-credentials PREFIX
read credential secrets from environment
variables starting with PREFIX that are encoded as
json lists, with json dictionary values e.g,
PREFIX["secret","foo"]={"secret":"bar"}.
--nocapture disable capturing of node output and send it directly
to stdout/stderr.
Viewer
^^^^^^
``python -m sympathy viewer --help``
.. code-block:: bash
usage: sympathy viewer [-h] [filename]
positional arguments:
filename sydata file
optional arguments:
-h, --help show this help message and exit
Install
^^^^^^^
``python -m sympathy install --help``
.. code-block:: bash
usage: sympathy install [-h] [--generate-all] [--compile] [--compile-all]
[--register] [--set-preference OPT-NAME OPT-VALUE]
[--all]
optional arguments:
-h, --help show this help message and exit
--generate-all generate parser files
--compile compile sympathy
--compile-all compile all site-package files
--register register desktop application and create shortcuts
--set-preference OPT-NAME OPT-VALUE
set value of setting
--all perform full installation, includes all options if
enabled or by default if no other options are provided
Uninstall
^^^^^^^^^
``python -m sympathy uninstall --help``
.. code-block:: bash
usage: sympathy uninstall [-h]
optional arguments:
-h, --help show this help message and exit
Clear
^^^^^
``python -m sympathy clear --help``
.. code-block:: bash
usage: sympathy clear [-h] [--caches] [--sessions]
optional arguments:
-h, --help show this help message and exit
--caches Clear caches for Sympathy.
--sessions Clear sessions for Sympathy.