Table API¶
API for working with the Table type.
Import this module like this:
from sympathy.api import table
A Table with columns, where each column has a name and a data type. All columns in the Table must always be of the same length.
Any node port with the Table type represents an object of this class.
Accessing the data¶
There are multiple APIs in the Table for adding and reading columns. The simplest one is to use indexing with column names:
>>> from sympathy.api import table
>>> mytable = table.File()
>>> mytable['foo'] = np.array([1,2,3])
>>> print(mytable['foo'])
[1 2 3]
It is also possible to convert between a Table and a pandas DataFrame
(to_dataframe()
, from_dataframe()
), a numpy recarray
(to_recarray()
, from_recarray()
), or a generator/list of rows
(to_rows()
, from_rows()
).
The size of the table can easily be found with the methods
number_of_rows()
and number_of_columns()
. The column names are
available via the method column_names()
.
Column and Table attributes¶
Both the Table itself and any of its columns can have attributes attached
to it. Attributes can be either scalar values or one-dimensional numpy arrays,
but not masked arrays. If the attribute value is bytes
(str
in python 2) the value must contain only ASCII characters.
There is currently no support for storing datetimes or timedeltas in attributes. As a workaround, you can convert the datetimes or timedeltas to either string or floats and store those instead.
Name restrictions¶
Column names, attribute names and table names can be almost any unicode
strings. For column names an empty string or a single period (.) are not
allowed. For attribute names only the empty string is not allowed. The names of
table attributes must also not be of the format __table_*__
since this
is reserved for storing attributes internal to the Sympathy platform.
If any of these names are set using bytes
(str
in python 2),
the name must contain only ASCII characters.
Class table.File
¶
-
class
sympathy.typeutils.table.
File
(mode=None, **kwargs)[source]¶ A Table with columns, where each column has a name and a data type. All columns in the Table must always be of the same length.
-
__contains__
(key)[source]¶ Return True if table contains a column named key.
Equivalent to
has_column()
.
-
__deepcopy__
(memo=None)[source]¶ Return new TypeAlias object that does not share references with self.
Must be re-implemented by subclasses that introduce additional fields to ensure that the fields are copied to the returned object.
-
__getitem__
(index)[source]¶ - Return type
Return a new
table.File
object with a subset of the table data.This method fully supports both one- and two-dimensional single indices and slices.
Examples:
>>> from sympathy.api import table >>> mytable = table.File.from_rows( ... ['a', 'b', 'c'], ... [[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> mytable.to_dataframe() a b c 0 1 2 3 1 4 5 6 2 7 8 9 >>> mytable[1].to_dataframe() a b c 0 4 5 6 >>> mytable[:,1].to_dataframe() b 0 2 1 5 2 8 >>> mytable[1,1].to_dataframe() b 0 5 >>> mytable[:2,:2].to_dataframe() a b 0 1 2 1 4 5 >>> mytable[::2,::2].to_dataframe() a c 0 1 3 1 7 9 >>> mytable[::-1,:].to_dataframe() a b c 0 7 8 9 1 4 5 6 2 1 2 3
If the key (index) is a string, it is assumed to be a column name and that column array will be returned.
-
__init__
(mode=None, **kwargs)[source]¶ Fileobj is a file owned. It should be closed by self. Data is a borrowed file. It shall not be closed by self. Filename is used to construct a new fileobj. Mode and scheme are used together with filename to construct the filename. Import_links is only usable together with filename and enables links to the file source to be written.
Fileobj, data and filename are mutually exclusive.
-
__setitem__
(index, other_table)[source]¶ Update the values at index with the values from other_table.
This method fully supports both one- and two-dimensional single indices and slices, but the dimensions of the slice must be the same as the dimensions of other_table.
If the key (index) is a string, it is assumed to be a column name and the value (other_table) argument an array.
-
property
attrs
¶ Return dictionary of attributes for table.
New in version 1.3.4.
-
static
from_dataframe
(dataframe)[source]¶ Return a new
table.File
with data from pandas dataframedataframe
.
-
static
from_matrix
(column_names, matrix)[source]¶ Return a new
table.File
with data from numpy matrixmatrix
.column_names
should be a list of strings which are used to name the resulting columns.
-
static
from_recarray
(recarray)[source]¶ Return a new
table.File
with data from numpy.recarray objectrecarray
.
-
static
from_rows
(column_names, rows, column_types=None)[source]¶ Returns new
table.File
with data from iterable rows and the specified column names.- Parameters
column_names ([str]) – Used to name the resulting columns.
rows (iterable) – Zero or more rows of cell values. Number of values should be the same for each row in the data.
column_types ([str or np.dtype], optional) – Used to specify type for the resulting columns, required for columns that would contain only None values which are otherwise ignored.
lengths of column_names and column_types (when provided) should (The) –
the number of cell values for each row. (match) –
- Returns
- Return type
-
get_attributes
()[source]¶ Get all table attributes and all column attributes.
Returns a tuple where the first element contains all the table attributes and the second element contains all the column attributes.
-
get_column_to_array
(column_name, index=None, kind='numpy')[source]¶ Return named column as an array.
Return type is numpy.array when kind is ‘numpy’ (by default) and dask.array.Array when kind is ‘dask’.
Dask arrays can be used to reduce memory use in locked subflows by handling data more lazily.
-
hjoin
(other_table, mask=False, rename=False)[source]¶ Add the columns from other_table.
Analoguous to
update()
.
-
names
(kind=None, fields=None, **kwargs)[source]¶ The names that can be automatically adjusted from a table.
kind should be one of ‘cols’ (all column names), ‘attrs’ (all table attribute names), or ‘name’ (the table name).
-
set_attributes
(attributes)[source]¶ Set table attributes and column attrubutes at the same time.
Input should be a tuple of dictionaries where the first element of the tuple contains the table attributes and the second element contains the column attributes.
-
set_column_attributes
(column_name, attributes)[source]¶ Set dictionary of scalar attributes for column_name.
Attribute values can be any numbers or strings.
-
set_column_from_array
(column_name, array, attributes=None)[source]¶ Write numpy array to column named by column_name. If the column already exists it will be replaced.
-
set_column_from_series
(series)[source]¶ Write pandas series to column named by series.name. If the column already exists it will be replaced.
-
set_table_attributes
(attributes)[source]¶ Set table attributes to those in dictionary attributes.
Attribute values can be any numbers or strings. Replaces any old table attributes.
Example:
>>> from sympathy.api import table >>> mytable = table.File() >>> mytable.set_table_attributes( ... {'Thou shall count to': 3, ... 'Ingredients': 'Spam'})
-
source
(other_table, shallow=False)[source]¶ Update self with the data from other, discarding any previous state in self.
- Parameters
other (type of self) – Object used as the source for (to update) self.
shallow (bool) –
When shallow is True a deepcopy of other will be avoided to improve performance, shallow=True must only be used in operations that do not modify other.
When shallow is False the result should be similar to performing the shallow=True with a deepcopy of other so that no modifications of either self or other, after the source operation, can affect the other object.
-
to_csv
(filename, header=True, encoding='UTF-8', delimiter=';', quotechar='"')[source]¶ Save/Export table to filename.
-
to_recarray
()[source]¶ Return numpy.recarray object with the table content or None if there are no columns.
-
to_rows
()[source]¶ Return a generator over the table’s rows.
Each row will be represented as a tuple of values.
-
update
(other_table)[source]¶ Updates the columns in the table with columns from other table keeping the old ones.
If a column exists in both tables the one from other_table is used. Creates links where possible.
-
update_column
(column_name, other_table, other_name=None)[source]¶ Updates a column from a column in another table.
The column other_name from other_table will be copied into column_name. If column_name already exists it will be replaced.
When other_name is not used, then column_name will be used instead.
-
version
()[source]¶ Return the version as a string. This is useful when loading existing files from disk.
New in version 1.2.5.
-
classmethod
viewer
()[source]¶ Return viewer class, which must be a subclass of sympathy.api.typeutil.ViewerBase
-
vjoin
(other_tables, input_index='', output_index='', fill=True, minimum_increment=1)[source]¶ Add the rows from the other_tables at the end of this table.
- Parameters
other_tables ([table]) –
input_index (str) – Column name for specified index column (deprecated).
output_index (str) – Column name for output index column generated
fill (bool or None) – When True, attempt to fill with NaN or a zero-like value. When False, discard columns not present in all other_tables. When None, mask output.
minimum_increment (int) – Index increment added for empty tables.
- Returns
- Return type
table
-
Class table.Column
¶
-
class
sympathy.typeutils.table.
Column
(name, parent_data)[source]¶ The
Column
class provides a read-only interface to a column in a Table.-
property
attrs
¶ A dictionary of all column attributes of this column.
-
property
data
¶ The data of the column as a numpy array. Equivalent to calling
File.get_column_to_array()
.
-
property
name
¶ The name of the column.
-
property