Autopilot’s data handling system was revamped as of v0.5.0, and now is based on pydantic
models and a series of interfaces that allow us to write data from the same abstract structures to several formats, initially
pytables and hdf5, but we have laid the groundwork for exporting to
A brief narrative overview here, and more detailed documentations within the relevant module documentation.
Autopilot’s models are built from pydantic models.
autopilot.root module defines some of Autopilot’s basic metaclasses, one of which is
data.modeling module extends
Autopilot_Type into several abstract modeling classes used for different
types of data:
modeling.base.Data- Containers for data, generally these are used as containers for data, or else used to specify how data should be handled and typed. Its subtypes indicate different classes of data that have different means of storage and representation depending on the interface.
modeling.base.Table- Tabular data specifies that there should be multiple values for each of the fields defined: in particular equal numbers of each of them. This is used for most data collected, as most data can be framed in a tabular format.
modeling.base.Node- Abstract specifications for hierarchical data interfaces - a Node is a particular element in a tree/network-like system, and a Group is a collection of Nodes. Some transitional work is still being done to generalize Autopilot’s former data structures from H5F-specific groups and nodes, so for the moment there is some parallel functionality in the
modeling.base.Schema- Specifications for organization of other data structures, for data that isn’t expected to ever be instantiated in its described form, but for scaffolding building other data structures together. Some transitional work is also being done here, eventually moving the Subject Schema to an abstract form (
Subject_Schema) vs one tied to HDF5 (
Specific models are then built out of the basic modeling components! This will serve as the point where data models can be added or modified by plugins (stay tuned).
Each of the modules contains several classes that are used together in some particular context:
models.subject- Schemas that define how the multiple models that go into a subject are combined and structured on disk
models.researcher- Stubs for researcher information that will be used in future versions for giving explicit credit for data gathered by a particular researcher or research group…
interfaces - Bridging to Multiple Representations
Interfaces define mappings between basic python types and the classes in
This set of classes is still growing, and we’re still exploring the best strategy to make generalizable interfaces between very different formats, but in general, each interface consists of mappings between types and some means of converting the particular data structures of one format and another.
Interface_Map- A specific declaration that one type is equivalent to another, with some optional conversion or parameterization
Interface- A stub for a future class that will handle conversion of the basic modeling components, but for this first pass we have just applied the mapsets directly to certain subtypes of modeling objects: See
The only interface that is actively used within Autopilot is that for
tables, but we have
started interfaces for
datajoint (using a parallel project datajoint-babel).
Both of these are provisional and very incomplete, but it is possible to generate a datajoint schema from any
table, and there are mappings and conversions for their different representations of types.
Our goal for future versions is to generalize data interfaces to the point where a similar API can be shared across them, so a subject’s data can be stored in HDF5 or in a datajoint database equivalently.
Subject is the main object that most people will use to interact with their data, and it is used throughout Autopilot to keep track of individual subjects data, the protocols they are run on, changes in code version over time, etc.
See the main
data.subject module page for further information.
This too is just a stub, but we will be moving more of our models to using specific SI units when appropriate rather
than using generic
ints with human-readable descriptions of when they are a mL or a ms vs. second or Liter, etc.