Quick-start guide
Getting started
This guide is available as a Colab notebook.
Install into your Python 3 environment using pip:
Then launch a Python session and run the following to create a new feature store:
By default, ByteHub creates a local sqlite database called bytehub.db
. Passing a SQLAlchemy connection string allows you to connect to a remote database hosting the feature store.
Next, create a namespace called tutorial
to store some features in. Edit the url
field to specify a local file storage location that you would like to use. Feature values will be saved within this folder using parquet format.
Now create a new feature inside this namespace:
Now we can generate a Pandas dataframe with time
and value
columns to store:
Now for some feature engineering. Suppose we want to create another feature called tutorial/squared
that contains the square of every value in tutorial/number
. To do this, define a transform as follows:
The transform receives a dataframe of everything in from_features
and should return a series/dataframe of transformed timeseries values. We can now look at some of our timeseries data using:
Using the load_dataframe
method, we can easily join, resample and filter the features. For example to get a monthly timeseries for 2020 we could run:
Up until now all of our data has been stored locally in the url we set when creating the namespace. To store data on a cloud storage service we need to create another namespace:
See the Dask documentation for information on url formats and the different storage_options
that you may want to set.
The simplest way to move our existing data to the cloud storage is with the clone_feature
method:
Now both of the features we created are available in the new, S3-backed namespace. View all of the features that we created in the tutorial by running:
Last updated