etable: DataTable / DataFrame structure in Go
etable (or eTable) provides a DataTable / DataFrame structure in Go (golang), similar to pandas and xarray in Python, and Apache Arrow Table, using
etensor n-dimensional columns aligned by common outermost row dimension.
The e-name derives from the
emergent neural network simulation framework, but
e is also extra-dimensional, extended, electric, easy-to-use – all good stuff.. :)
examples/dataproc for a full demo of how to use this system for data analysis, paralleling the example in Python Data Science using pandas, to see directly how that translates into this framework.
See Wiki for how-to documentation, etc.
As a general convention, it is safest, clearest, and quite fast to access columns by name instead of index (there is a map that caches the column indexes), so the base access method names generally take a column name argument, and those that take a column index have an
Idx suffix. In addition, we adopt the GoKi Naming Convention of using the
Try suffix for versions that return an error message. It is a bit painful for the writer of these methods but very convenient for the users..
The following packages are included:
bitsliceis a Go slice of bytes
bytethat has methods for setting individual bits, as if it was a slice of bools, while being 8x more memory efficient. This is used for encoding null entries in
etensor, and as a Tensor of bool / bits there as well, and is generally very useful for binary (boolean) data.
etensoris a Tensor (n-dimensional array) object.
etensor.Tensoris an interface that applies to many different type-specific instances, such as
etensor.Float32. A tensor is just a
etensor.Shapeplus a slice holding the specific data type. Our tensor is based directly on the Apache Arrow project’s tensor, and it fully interoperates with it. Arrow tensors are designed to be read-only, and we needed some extra support to make our
etable.Tablework well, so we had to roll our own. Our tensors also interoperate fully with Gonum’s 2D-specific Matrix type for the 2D case.
etable.TableDataTable / DataFrame object, which is useful for many different data analysis and database functions, and also for holding patterns to present to a neural network, and logs of output from the models, etc. A
etable.Tableis just a slice of
etensor.Tensorcolumns, that are all aligned along the outer-most row dimension. Index-based indirection, which is essential for efficient Sort, Filter etc, is provided by the
etable.IdxViewtype, which is an indexed view into a Table. All data processing operations are defined on the IdxView.
etviewprovides an interactive tabular, spreadsheet-style GUI using GoGi for viewing and editing
etview.TensorGridalso provides a colored grid display higher-dimensional tensor data.
aggprovides standard aggregation functions (
Stdetc) operating over
etable.IdxViewviews of Table data. It also defines standard
AggFuncfunctions such as
SumFuncwhich can be used for
Aggfunctions on either a Tensor or IdxView.
tsraggprovides the same agg functions as in
agg, but operating on all the values in a given
Tensor. Because of the indexed, row-based nature of tensors in a Table, these are not the same as the
splitsupports splitting a Table into any number of indexed sub-views and aggregating over those (i.e., pivot tables), grouping, summarizing data, etc.
metricprovides similarity / distance metrics such as
Correlationthat operate on slices of
simatprovides similarity / distance matrix computation methods operating on
SimMattype holds the resulting matrix and labels for the rows and columns, which has a special
etviewfor visualizing labeled similarity matricies.
pcaprovides principal-components-analysis (PCA) and covariance matrix computation functions.
clustprovides standard agglomerative hierarchical clustering including ability to plot results in an eplot.
minmaxis home of basic Min / Max range struct, and
normhas lots of good functions for computing standard norms and normalizing vectors.