Table API

API for working with the Table type.

Import this module like this:

from sympathy.api import table

A Table with columns, where each column has a name and a data type. All columns in the Table must always be of the same length.

Any node port with the Table type represents an object of this class.

Accessing the data

There are multiple APIs in the Table for adding and reading columns. The simplest one is to use indexing with column names:

>>> from sympathy.api import table
>>> mytable = table.File()
>>> mytable['foo'] = np.array([1,2,3])
>>> print(mytable['foo'])
[1 2 3]

It is also possible to convert between a Table and a pandas DataFrame (to_dataframe(), from_dataframe()), a numpy recarray (to_recarray(), from_recarray()), or a generator/list of rows (to_rows(), from_rows()).

The size of the table can easily be found with the methods number_of_rows() and number_of_columns(). The column names are available via the method column_names().

Column and Table attributes

Both the Table itself and any of its columns can have attributes attached to it. Attributes can be either scalar values or one-dimensional numpy arrays, but not masked arrays. If the attribute value is bytes (str in python 2) the value must contain only ASCII characters.

There is currently no support for storing datetimes or timedeltas in attributes. As a workaround, you can convert the datetimes or timedeltas to either string or floats and store those instead.

Name restrictions

Column names, attribute names and table names can be almost any unicode strings. For column names an empty string or a single period (.) are not allowed. For attribute names only the empty string is not allowed. The names of table attributes must also not be of the format __table_*__ since this is reserved for storing attributes internal to the Sympathy platform.

If any of these names are set using bytes (str in python 2), the name must contain only ASCII characters.

Class table.File

class sympathy.typeutils.table.File(mode=None, **kwargs)[source]

A Table with columns, where each column has a name and a data type. All columns in the Table must always be of the same length.

__contains__(key)[source]

Return True if table contains a column named key.

Equivalent to has_column().

__deepcopy__(memo=None)[source]

Return new TypeAlias that does not share references with self. Must be re-implemented by subclasses that define their own storage fields.

__getitem__(index)[source]
Return type:table.File

Return a new table.File object with a subset of the table data.

This method fully supports both one- and two-dimensional single indices and slices.

Examples:

>>> from sympathy.api import table
>>> mytable = table.File.from_rows(
...     ['a', 'b', 'c'],
...     [[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> mytable.to_dataframe()
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9
>>> mytable[1].to_dataframe()
   a  b  c
0  4  5  6
>>> mytable[:,1].to_dataframe()
   b
0  2
1  5
2  8
>>> mytable[1,1].to_dataframe()
   b
0  5
>>> mytable[:2,:2].to_dataframe()
   a  b
0  1  2
1  4  5
>>> mytable[::2,::2].to_dataframe()
   a  c
0  1  3
1  7  9
>>> mytable[::-1,:].to_dataframe()
   a  b  c
0  7  8  9
1  4  5  6
2  1  2  3

If the key (index) is a string, it is assumed to be a column name and that column array will be returned.

__init__(mode=None, **kwargs)[source]

Fileobj is a file owned. It should be closed by self. Data is a borrowed file. It shall not be closed by self. Filename is used to construct a new fileobj. Mode and scheme are used together with filename to construct the filename. Import_links is only usable together with filename and enables links to the file source to be written.

Fileobj, data and filename are mutually exclusive.

__setitem__(index, other_table)[source]

Update the values at index with the values from other_table.

This method fully supports both one- and two-dimensional single indices and slices, but the dimensions of the slice must be the same as the dimensions of other_table.

If the key (index) is a string, it is assumed to be a column name and the value (other_table) argument an array.

attr(name)[source]

Get the tables attribute with name.

attrs

Return dictionary of attributes for table.

New in version 1.3.4.

clear()[source]

Clear the table. All columns and attributes will be removed.

col(name)[source]

Get a Column object for column with name.

New in version 1.3.4.

cols()[source]

Get a list of all columns as Column objects.

New in version 1.3.4.

column_names()[source]

Return a list with the names of the table columns.

column_type(column)[source]

Return the dtype of column named column.

static from_dataframe(dataframe)[source]

Return a new table.File with data from pandas dataframe dataframe.

static from_matrix(column_names, matrix)[source]

Return a new table.File with data from numpy matrix matrix. column_names should be a list of strings which are used to name the resulting columns.

static from_recarray(recarray)[source]

Return a new table.File with data from numpy.recarray object recarray.

static from_rows(column_names, rows)[source]

Return new table.File with data from iterable rows. column_names should be a list of strings which are used to name the resulting columns.

get_attributes()[source]

Get all table attributes and all column attributes.

Returns a tuple where the first element contains all the table attributes and the second element contains all the column attributes.

get_column_attributes(column_name)[source]

Return dictionary of attributes for column_name.

get_column_to_array(column_name, index=None, kind='numpy')[source]

Return named column as an array.

Return type is numpy.array when kind is ‘numpy’ (by default) and dask.array.Array when kind is ‘dask’.

Dask arrays can be used to reduce memory use in locked subflows by handling data more lazily.

get_column_to_series(column_name)[source]

Return named column as pandas series.

get_name()[source]

Return table name or None if name is not set.

get_table_attributes()[source]

Return dictionary of attributes for table.

has_column(key)[source]

Return True if table contains a column named key.

New in version 1.1.3.

hjoin(other_table, mask=False, rename=False)[source]

Add the columns from other_table.

Analoguous to update().

classmethod icon()[source]

Return full path to svg icon.

is_empty()[source]

Returns True if the table is empty.

names(kind=None, **kwargs)[source]

The names that can be automatically adjusted from a table.

kind should be one of ‘cols’ (all column names), ‘attrs’ (all table attribute names), or ‘name’ (the table name).

number_of_columns()[source]

Return the number of columns in the table.

number_of_rows()[source]

Return the number of rows in the table.

set_attributes(attributes)[source]

Set table attributes and column attrubutes at the same time.

Input should be a tuple of dictionaries where the first element of the tuple contains the table attributes and the second element contains the column attributes.

set_column_attributes(column_name, attributes)[source]

Set dictionary of scalar attributes for column_name.

Attribute values can be any numbers or strings.

set_column_from_array(column_name, array, attributes=None)[source]

Write numpy array to column named by column_name. If the column already exists it will be replaced.

set_column_from_series(series)[source]

Write pandas series to column named by series.name. If the column already exists it will be replaced.

set_name(name)[source]

Set table name. Use None to unset the name.

set_table_attributes(attributes)[source]

Set table attributes to those in dictionary attributes.

Attribute values can be any numbers or strings. Replaces any old table attributes.

Example:

>>> from sympathy.api import table
>>> mytable = table.File()
>>> mytable.set_table_attributes(
...     {'Thou shall count to': 3,
...      'Ingredients': 'Spam'})
source(other_table, shallow=False)[source]

Update self with the data from other, without keeping the old state. When shallow is False (default), self should be updated with a deepcopy of other.

self and other must be of the exact same type.

to_dataframe()[source]

Return pandas DataFrame object with all columns in table.

to_matrix()[source]

Return numpy matrix with all the columns in the table.

to_recarray()[source]

Return numpy.recarray object with the table content or None if there are no columns.

to_rows()[source]

Return a generator over the table’s rows.

Each row will be represented as a tuple of values.

types(kind=None, **kwargs)[source]

Return types associated with names.

update(other_table)[source]

Updates the columns in the table with columns from other table keeping the old ones.

If a column exists in both tables the one from other_table is used. Creates links where possible.

update_column(column_name, other_table, other_name=None)[source]

Updates a column from a column in another table.

The column other_name from other_table will be copied into column_name. If column_name already exists it will be replaced.

When other_name is not used, then column_name will be used instead.

version()[source]

Return the version as a string. This is useful when loading existing files from disk.

New in version 1.2.5.

classmethod viewer()[source]

Return viewer class, which must be a subclass of sympathy.api.typeutil.ViewerBase

vjoin(other_tables, input_index='', output_index='', fill=True, minimum_increment=1)[source]

Add the rows from the other_tables at the end of this table.

Parameters:
  • other_tables ([table]) –
  • input_index (six.text_type) – Column name for specified index column (deprecated).
  • output_index (six.text_type) – Column name for output index column generated
  • fill (bool or None) – When True, attempt to fill with NaN or a zero-like value. When False, discard columns not present in all other_tables. When None, mask output.
  • minimum_increment (int) – Index increment added for empty tables.
Returns:

Return type:

table

vsplit(output_list, input_index, remove_fill)[source]

Split the current table to a list of tables by rows.

Class table.Column

class sympathy.typeutils.table.Column(name, parent_data)[source]

The Column class provides a read-only interface to a column in a Table.

attr(name)[source]

Return the value of the column attribute name.

attrs

A dictionary of all column attributes of this column.

data

The data of the column as a numpy array. Equivalent to calling File.get_column_to_array().

name

The name of the column.