Concepts
Learn about the core concepts of Synnax.
Synnax is a streaming time-series database designed to acquire, store, and transmit real-time data from hardware systems. It scales horizontally, and can be deployed on edge services for data acquisition or in cloud environments for high-performance analysis.
Synnax inherits a hybrid pedigree from hardware data acquisition (DAQ) systems and cloud-native, horizontally scalable databases. This page introduces the core concepts needed to effectively integrate a Synnax cluster into your systems.
Distribution Components
Nodes
A node is an individual, running instance of the Synnax executable. The host machine can be an edge device, VM, container, or bare metal server. The only requirement is that it can store data on disk and has an address reachable by other nodes in the cluster.
Clusters
Nodes communicate with each other to form a cluster. The nodes in a cluster collaborate to read, write, and exchange data. Nodes expose the cluster as a monolothic data space, meaning that a user can query a single node for the entire cluster’s data without being aware of where it is actually stored.
Data Components
Now that we’ve covered the basic distributed systems terminology, we’re ready to step into the data components in a cluster.
To illustrate how these components work together, we’ll use the example of a cyclist who takes several two-hour rides over the course of a month. During each ride, they use a speedometer to record their instantaneous speed once per second.
Samples
A sample is a strongly typed value recorded at a specific point in time. The readings
from our cyclicts’s speedometer are reported as float32
values in kilometers per hour.
Channels
A channel is a logical collection of samples emitted by or representing the values of a single source (typically a sensor or actuator). We can store the speedometer readings across all the cyclists’s rides in a single channel titled “speed-gps”. We can also create channels that store post-processed results or simulated values. For example, we can record target speeds for each ride in a “speed-target” channel, and then write the difference between our target and actual readings in a “speed-diff” channel. As long as the samples are time-ordered, do not have duplicates (i.e. no two samples have the same timestamp), and have a consistent data type, they can be contained in a channel.
Channels can also be used to stream samples in real-time. This is useful for live plotting, control sequences, and post-processing applications.
Ranges
A range (short for “time range”) is a user-defined region of a channel’s data. Ranges are purely for categorization and do not affect the structure of a channel’s data. Ranges can also overlap with or contain other ranges. Ranges are typically used to indicate important events or categorize long periods of time.
After each ride, we can identify periods of interest, such as hills or descents, and define ranges to mark them as relevant for analysis. If our cyclist is training for a century race, we can also wrap all of their rides in a ‘century training’ range to keep them nicely categorized.
Series
While ranges virtually separate areas of related data, series are used to hold the actual samples. They can be compared to arrays or lists in most programming languages. Series are strongly typed and hold their values in time-order. When writing data to the cluster, a user must provide a frame containing an series of samples for each channel they wish to write to. When reading data, a caller typically receives frames containing series of samples for each channel across the requested period of time.
Frames
A frame is a collection of related series. These series form a table-like structure
comparable to a pandas DataFrame
in Python or a data.frame
in R. Each column holds
one or more series. Frames are the fundamental unit of data transfer within a cluster,
and are used for reads, writes, and streaming.
Operation Components
The operation components are key interfaces that allows users to access and modify the samples in a cluster.
Writers
Writers are used to write samples to a cluster. They can be used to write static data in large batches or stream data in real-time. A writer can be opened on multiple channels, where each frame contains series with samples for each channel. Writers support atomic transactions, meaning that all samples in a frame are written to the cluster or none are. This is particularly useful when reading in data from large files.
Writers also support dynamic control handoff, which is when multiple writers can be opened on the same channel, but only a subset (typically one) of the writers is actually allowed to write to the channel at any given time. This is useful for transitioning control between manual operators and automated systems.
Iterators / Readers
The primary method for reading data from a cluster is through an iterator. Iterators read data in a streaming fashion, allowing users to efficiently query and process large quantities of data. They can be opened on one or more channels to read historical data across a specific range of time. Iterators are sometimes called “readers”.
Streamers
Streamers are used to stream data in real-time. They can be thought of as a ‘subscriber’ in a traditional publish-subscribe system. Like iterators, streamers can be opened on one or more channels to receive data as it is being written. Streamers are useful for live plotting, control, and real-time post processing.
Deletes
Deletes are used to remove samples from a cluster.
Synnax as a Spreadsheet
A Synnax cluster’s data can be though of as a very large, distributed spreadsheet. Each channel is a column and each row contains several samples.
time | speed-gps | speed-target | speed-diff |
---|---|---|---|
1677282520236429056 | 3.0 | 7.0 | 4.0 |
1678282520236429056 | 12.1 | 7.0 | -5.1 |
1679282520236429056 | 28.2 | 7.0 | -21.2 |
1680282520236429056 | 15.3 | 7.0 | -8.3 |
1681282520236429056 | 22.4 | 7.0 | -15.4 |
1682282520236429056 | 11.5 | 7.0 | -4.5 |
1697282520236429056 | 3.0 | 7.0 | 4.0 |
1698282520236429056 | 3.1 | 7.0 | 3.9 |
1699282520236429056 | 9.6 | 7.0 | -2.6 |
1700282520236429056 | 18.7 | 7.0 | -11.7 |
1701282520236429056 | 13.8 | 7.0 | -6.8 |
1717282520236429056 | 3.0 | 7.0 | 4.0 |
1718282520236429056 | 19.1 | 7.0 | -12.1 |
1719282520236429056 | 27.2 | 7.0 | -20.1 |
1720282520236429056 | 15.3 | 7.0 | -8.3 |
This table describes the data layout for our cyclist’s rides. Each individual channel, such as speed-gps or speed-target, is a series. The collection of series indexed to one time series is a frame.