Feature store

A ByteHub feature store comprises:

  • A database, which stores metadata about features, e.g. name, description;

  • One or more namespaces, under which features are defined;

  • A data storage location for each namespace, where the actual feature values are saved; and

  • A timeseries for each feature.

Namespaces can be used to separate features into groups. For example, you might want to separate features used during development, testing and production, or you might choose to create namespaces for different projects/teams.

We store timeseries data in Parquet format at each storage location. This can either be on a local filesystem, or on a remote cloud storage bucket like AWS S3, Azure Blob, etc.

We usually refer to features using the format {namespace}/{name}, so for example a feature called my-feature defined inside the prod namespace would be called prod/my-feature. Think of this as being like GitHub organisations and repo names.