.. This file is part of Sympathy for Data.
.. Copyright (c) 2020 Combine Control Systems AB
..
.. Sympathy for Data is free software: you can redistribute it and/or modify
.. it under the terms of the GNU General Public License as published by
.. the Free Software Foundation, version 3 of the License.
..
.. Sympathy for Data is distributed in the hope that it will be useful,
.. but WITHOUT ANY WARRANTY; without even the implied warranty of
.. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
.. GNU General Public License for more details.
..
.. You should have received a copy of the GNU General Public License
.. along with Sympathy for Data.  If not, see <http://www.gnu.org/licenses/>.

Appendix
========


.. _appendix_encoding_text:

Encoding text
-------------

Some Sympathy nodes allows you to choose an encoding, especially ones that write
or read from files or communicate over a network. This section is a short
introduction about encodings to help you choose.

Character encoding determines the translation between text characters and bytes,
for example, stored in a file. Each encoding uses a different translation scheme
and can support different languages.

- Encode: text characters -> bytes
- Decode: bytes -> text characters

To recreate the original text, choose the same encoding for decode as was
used to encode the data.

See https://en.wikipedia.org/wiki/Character_encoding for more information.


Notable Encodings
^^^^^^^^^^^^^^^^^

Here are some encodings that we offer as choices in Sympathy.

Recommended encoding:

- `UTF-8 <https://en.wikipedia.org/wiki/UTF-8>`_, supports essentially all
  written languages. Widely used on the web and **strongly recommended** when you
  have the freedom to choose. Capable of encoding all valid unicode characters.

Other encodings:

These are not recommended but could be needed when working with existing
files and applications. **Use only when required!**

- `UTF-16 <https://en.wikipedia.org/wiki/UTF-16>`_, supports essentially all
  written languages. There are variations depending on `byte order (endianness)
  <https://en.wikipedia.org/wiki/Endianness>`_ UTF-16-LE, UTF-16-BE, and UTF-16
  which uses `byte order mark (BOM)
  <https://en.wikipedia.org/wiki/Byte_order_mark>`_ to determine to use LE or
  BE.  UTF-16 is used internally by Microsoft Windows but is now generally
  superceeded by UTF-8. Capable of encoding all valid unicode characters.

- `US-ASCII <https://en.wikipedia.org/wiki/ASCII>`_, supports American English.

- `ISO 8859-1 (Latin-1) <https://en.wikipedia.org/wiki/ISO/IEC_8859-1>`_, supports
  Western European Languages, superset of US-ASCII.

- `ISO 8859-15 (Latin-9) <https://en.wikipedia.org/wiki/ISO/IEC_8859-15>`_,
  supports Western European Languages, similar to ISO 8859-1 but replaces some
  less common symbols, introducing the euro sign.

- `Windows code page <https://en.wikipedia.org/wiki/Windows_code_page>`_
  encodings, are sometimes used by older applications and file formats,
  especially ones for Windows. Can be identified with a `code page identifiers
  <https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers>`_.
  Superseded by unicode and UTF-8, etc. but can still be found in files today.

- `Windows-1252 <https://en.wikipedia.org/wiki/Windows-1252>`_, example of a
  Windows code page which supports Western European Languages. Superset of ISO
  8859-1 in terms of printable characters. Used in the legacy components of
  Microsoft Windows for English and many European languages.

For other encodings (if you type the name by hand), use the `Codec` names from
https://docs.python.org/3/library/codecs.html#standard-encodings.


Choosing an encoding
^^^^^^^^^^^^^^^^^^^^

If you are responsible for both encoding and decoding, you should probably just
use utf8 as character encoding at both ends.

If you receive data files you need to use the same as was used to encode them.
If you don't know what encoding was used, you can try the ones listed above in
order to try to identify the correct one.

If you produce data files you need to communicate the character encoding to the
consumers of those data files.

There are also applications and libraries that use heuristics to try to
automatically identify character encodings, but these are prone to failure in
many cases and are not generally recommended.

Mismatched encodings
^^^^^^^^^^^^^^^^^^^^

Decoding using an different character encoding than the one used for encoding
may result in garbled text, making some characters appears as unrelated ones.

Example with the Swedish word "Björnbärssnår" (Blackberry thicket):

========  ==========  =================
Encode    Decode      Result
========  ==========  =================
UTF-8     UTF-8       Björnbärssnår
UTF-8     ISO-8859-1  BjÃ¶rnbÃ¤rssnÃ¥r
UTF-8     ISO-8859-2  BjĂśrnbĂ¤rssnĂĽr
UTF-8     UTF-16-LE   橂뛃湲썢犤獳썮犥
UTF-8     UTF-16-BE   䉪쎶牮拃ꑲ獳滃ꕲ
========  ==========  =================

As seen, unmatched encodings can result in anything from misrepresented special
characters to a result that is compeletely off. The result can also be correct
for some words in a larger text and incorrect for others. Both encode and decode
can also fail completely if there is no possible translation, depending on the
combination of characters (encode) or bytes (decode).

See https://en.wikipedia.org/wiki/Mojibake for more information.


Encodings in Python
^^^^^^^^^^^^^^^^^^^

Encodings in Python is performed using 2 different methods: str.encode and
bytes.decode. Names for available encodings can be found in the documentation
for the `codecs <https://docs.python.org/3/library/codecs.html>`_ module.

Encode using an unsupported encoding results in an UnicodeEncodeError.

.. code-block:: python

   >>> 'Björnbärssnår'.encode('ascii')
   Traceback (most recent call last):
   ...
   UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 2: ordinal not in range(128)


Decode using an unsupported encoding results in UnicodeDecodeError.

.. code-block:: python

   >>> encoded = 'Björnbärssnår'.encode('iso-8859-1')
   >>> encoded
   b'Bj\xf6rnb\xe4rssn\xe5r'
   >>> encoded.decode('iso-8859-1')
   'Björnbärssnår'
   >>> encoded.decode('utf-8')
   Traceback (most recent call last):
   ...
   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 2: invalid start byte


Often, the right way to deal with these exceptions is simply to choose the
intended encoding.  When the exact encoding is unknown or if the data is somehow
corrupt, Python offers the `errors` parameter for `encode` and `decode` - which can
substitute or ignore unsupported symbols.


.. code-block:: python

   >>> encoded = 'Björnbärssnår'.encode('iso-8859-1')
   >>> encoded
   b'Bj\xf6rnb\xe4rssn\xe5r'
   >>> encoded.decode('iso-8859-1')
   'Björnbärssnår'
   >>> encoded.decode('utf-8', errors='replace')
   'Bj�rnb�rssn�r'

Here, `errors='replace'` substitutes � in place of unhandled characters instead
of raising a UnicodeDecodeError. For more options, see
https://docs.python.org/3/howto/unicode.html.


.. _appendix_typed_text:

Input typed values as text
--------------------------

Some nodes will allow you to input text to use to produce a typed value - which
could depend, for example, on the type of columns used in the operation.  The
text needs to use a format that is understood by the functions for reading the
type used.

If the type is text, any input will do, but for other types see the following
examples:

    :bool: True, False, true, false, 1, 0
    :integer: 0, 1, 2, ...
    :float: 0, 0.0, 1, 1.1, ...
    :text: Anything goes here!
    :datetime: 1970-01-01T00:00:00.000000,
               1970-01-01 00:00:00.000000,
               1970-01-01 00:00:00.00,
               1970-01-01
    :timedelta: 1 days,
                2 d,
                44.333 seconds,
                2 days 2 h 44 seconds,
    :complex:  1.1 + 2j


.. _appendix_cli:

All command line options
------------------------


Top-level
^^^^^^^^^

``python -m sympathy --help``

.. code-block:: bash

   usage: sympathy [-h]
                      {gui,cli,viewer,install,uninstall,tests,clear,launch}
                      ...

   Sympathy for Data

   optional arguments:
     -h, --help            show this help message and exit
     --version             show Sympathy for Data version and exit

   Commands:
     {gui,cli,viewer,install,uninstall,tests,clear,launch}
                           Command
       gui                 run Sympathy in GUI mode
       cli                 run Sympathy in CLI mode
       viewer              run the viewer for sydata files.
       install             install Sympathy (start menu, file associations,
                           documentation)
       uninstall           uninstall Sympathy (start menu, file associations)
       tests               run the test suite
       clear               cleanup temporary files
       launch              internal use only


Gui and Cli
^^^^^^^^^^^

The options for the gui and cli commands are similar.

``python -m sympathy gui --help``

.. code-block:: bash

    usage: __main__.py gui [-h] [--exit-after-exception {0,1}]
                           [-L LOGGER [LEVEL ...]]
                           [--num-worker-processes NUM_WORKER_PROCESSES]
                           [-I INIFILE] [--nocapture]
                           [filename]

    positional arguments:
    filename              file containing workflow.

    optional arguments:
    -h, --help            show this help message and exit
    --exit-after-exception {0,1}
                          exit after uncaught exception occurs in a signal handler
    -L LOGGER [LEVEL ...],
    --loglevel LOGGER [LEVEL ...]
                          A logger configuration with a logger name and a level
                          (e.g. -L app.stats warning). This argument can be
                          repeated.
    --num-worker-processes NUM_WORKER_PROCESSES
                          number of python worker processes (0) use system number
                          of CPUs
    -I INIFILE,
    --inifile INIFILE
                          settings ini-file to use instead of the default
    --environment-credentials PREFIX
                          read credential secrets from environment
                          variables starting with PREFIX that are encoded as
                          json lists, with json dictionary values e.g,
                          PREFIX["secret","foo"]={"secret":"bar"}.
    --nocapture           disable capturing of node output and send it directly to
                          stdout/stderr.


.. code-block:: bash

   usage: launch.py gui [-h] [--exit-after-exception {0,1}] [-v]
                        [-L {0,1,2,3,4,5}] [-N {0,1,2,3,4,5}]
                        [--num-worker-processes NUM_WORKER_PROCESSES]
                        [-C CONFIGFILE [CONFIGFILE ...]] [-I INIFILE]
                        [--nocapture]
                        [filename]

   positional arguments:
     filename              file containing workflow.

   optional arguments:
     -h, --help            show this help message and exit
     --exit-after-exception {0,1}, --exit_after_exception {0,1}
                           exit after uncaught exception occurs in a signal
                           handler
     -L {0,1,2,3,4,5}, --loglevel {0,1,2,3,4,5}
                           (0) disable logging, (5) enable all logging
     -N {0,1,2,3,4,5}, --node-loglevel {0,1,2,3,4,5}, --node_loglevel {0,1,2,3,4,5}
                           (0) disable logging, (5) enable all logging
     --num-worker-processes NUM_WORKER_PROCESSES, --num_worker_processes NUM_WORKER_PROCESSES
                           number of python worker processes (0) use system
                           number of CPUs
     -C CONFIGFILE [CONFIGFILE ...], --configfile CONFIGFILE [CONFIGFILE ...]
                           workflow configuration file, used to change parameters
                           and an optional outfile for the modified workflow
     -I INIFILE, --inifile INIFILE
                           settings ini-file to use instead of the default
     --environment-credentials PREFIX
                           read credential secrets from environment
                           variables starting with PREFIX that are encoded as
                           json lists, with json dictionary values e.g,
                           PREFIX["secret","foo"]={"secret":"bar"}.
     --nocapture           disable capturing of node output and send it directly
                           to stdout/stderr.


Viewer
^^^^^^

``python -m sympathy viewer --help``

.. code-block:: bash

   usage: sympathy viewer [-h] [filename]

   positional arguments:
     filename    sydata file

   optional arguments:
     -h, --help  show this help message and exit

Install
^^^^^^^

``python -m sympathy install --help``

.. code-block:: bash

   usage: sympathy install [-h] [--generate-all] [--compile] [--compile-all]
                           [--register] [--set-preference OPT-NAME OPT-VALUE]
                           [--all]

   optional arguments:
     -h, --help            show this help message and exit
     --generate-all        generate parser files
     --compile             compile sympathy
     --compile-all         compile all site-package files
     --register            register desktop application and create shortcuts
     --set-preference OPT-NAME OPT-VALUE
                           set value of setting
     --all                 perform full installation, includes all options if
                           enabled or by default if no other options are provided

Uninstall
^^^^^^^^^

``python -m sympathy uninstall --help``

.. code-block:: bash

   usage: sympathy uninstall [-h]

   optional arguments:
     -h, --help  show this help message and exit


Clear
^^^^^


``python -m sympathy clear --help``

.. code-block:: bash

   usage: sympathy clear [-h] [--caches] [--sessions]

   optional arguments:
     -h, --help  show this help message and exit
     --caches    Clear caches for Sympathy.
     --sessions  Clear sessions for Sympathy.