specless.dataset

Data and Dataset classes

Data Class

It’s basically a table. You can access its size:

>>> from specless.typing import Data
>>> demonstration = Data([['a', 1], ['b', 4], ['c', 6]], columns=['symbol', 'timestamp'])
>>> l = demonstration.size       # Return the number of elements in this object

If it were a TimedTraceData object, it has a trace and timestamp data.

>>> symbols = demonstration["symbol"]            # or demonstration.symbol
...                                              # Returns a Series object
>>> timestamps = demonstration["timestamp"]      # or demonstration.timestamp
...                                              # Returns a Series object

or turn it into a list of tuples

>>> demonstration.values.tolist()                # Returns a list of list
[['a', 1], ['b', 4], ['c', 6]]

You can sort the data

>>> sorted_demonstration = demonstration.sort_values(by="timestamp")
>>> demonstration.sort_values(by=["timestamp", "symbol"], inplace=True)

Dataset Class

A Data object can access a data (demonstration/trace) by:

>>> import specless as sl
>>> demonstrations = [
...     [["e1",1], ["e2",2], ["e3",3], ["e4",4], ["e5",5]],  # trace 1
...     [["e1",1], ["e4",3], ["e2",5], ["e3",7], ["e5",9]],  # trace 2
...     [["e1",2], ["e2",4], ["e4",6], ["e3",8], ["e5",10]], # trace 3
... ]
>>> demonstrations = sl.ArrayDataset(demonstrations, columns=["symbol", "timestamp"])
>>> demonstration = demonstrations[0]

We can also return a list of data

>>> demonstrations.tolist()
[[['e1', 1], ['e2', 2], ['e3', 3], ['e4', 4], ['e5', 5]], [['e1', 1], ['e4', 3], ['e2', 5], ['e3', 7], ['e5', 9]], [['e1', 2], ['e2', 4], ['e4', 6], ['e3', 8], ['e5', 10]]]
>>> demonstrations.tolist(key="symbol")
[['e1', 'e2', 'e3', 'e4', 'e5'], ['e1', 'e4', 'e2', 'e3', 'e5'], ['e1', 'e2', 'e4', 'e3', 'e5']]

You can sort dataset in a batch

>>> f = lambda data: data.sort_values(by=["timestamp", "symbol"], inplace=True)
>>> demonstrations.apply(f)

Classes

ArrayDataset

Dataset class that contains a list of data.

BaseDataset

Base Dataset Class

CSVDataset

Reads a list of csv files and turns them into a dataset.

PathToFileDataset

Dataset class that contains a path to a file