Bitcoin Price Data

This tutorial demonstrates how to populate a feature store with a timeseries of bitcoin price data, and then compute some transformations on it.

Run this tutorial as a Colab notebook.

As with the quick-start guide, start by creating a blank feature store and namespace.

fs = bh.FeatureStore()

    'tutorial', url='/tmp/featurestore/tutorial', description='Tutorial datasets'

Now download some data from the CoinDesk API and save it to the feature store.

from_date = '2017-01-01'
to_date ='%Y-%m-%d')

response = requests.get(
  params={'start': from_date, 'end': to_date}

df_close = pd.DataFrame(
    'time': pd.to_datetime(list(response.json().get('bpi').keys())),
    'value': response.json().get('bpi').values()

fs.create_feature('tutorial/bitcoin.close', partition='year') # Data is partitioned by year on disk
fs.save_dataframe(df_close, 'tutorial/bitcoin.close')

When creating the tutorial/bitcoin.close feature, we specified partition="year". ByteHub allows you to choose with year or date partitioning, which will result in the saved data being split into separate folders when it is saved. Choose date if you are working with data that has a very high time-resolution, e.g. updated every second, otherwise choose year, as this will create few files and result in better performance.

We can now query and resample this data using load_dataframe:

df_weekly = fs.load_dataframe(
    from_date='2020-01-01', to_date='2020-12-31', 

Now create transform features to compute the exponentially-weighted moving averages of the bitcoin price over different time windows, along with a momentum indicator.

@fs.transform('tutorial/bitcoin.ewma.15', from_features=['tutorial/bitcoin.close'])
def ewma_15(df):
    return df.ewm(halflife=15).mean()

@fs.transform('tutorial/bitcoin.ewma.30', from_features=['tutorial/bitcoin.close'])
def ewma_30(df):
    return df.ewm(halflife=30).mean()
@fs.transform('tutorial/bitcoin.momentum', from_features=['tutorial/bitcoin.ewma.15', 'tutorial/bitcoin.ewma.30'])
def ewma_15(df):
     return df['tutorial/bitcoin.ewma.15'] - df['tutorial/bitcoin.ewma.30']

These new, transformed features are now available to query from the load_dataframe method. They are calculated on-the-fly, and therefore reflect any changes to the underlying bitcoin price data.

Last updated