Comment on page
Bitcoin Price Data
This tutorial demonstrates how to populate a feature store with a timeseries of bitcoin price data, and then compute some transformations on it.
fs = bh.FeatureStore()
fs.create_namespace(
'tutorial', url='/tmp/featurestore/tutorial', description='Tutorial datasets'
)
Now download some data from the CoinDesk API and save it to the feature store.
from_date = '2017-01-01'
to_date = pd.Timestamp.now().strftime('%Y-%m-%d')
response = requests.get(
'https://api.coindesk.com/v1/bpi/historical/close.json',
params={'start': from_date, 'end': to_date}
)
response.raise_for_status()
df_close = pd.DataFrame(
{
'time': pd.to_datetime(list(response.json().get('bpi').keys())),
'value': response.json().get('bpi').values()
}
).set_index('time')
fs.create_feature('tutorial/bitcoin.close', partition='year') # Data is partitioned by year on disk
fs.save_dataframe(df_close, 'tutorial/bitcoin.close')
When creating the
tutorial/bitcoin.close
feature, we specified partition="year"
. ByteHub allows you to choose with year
or date
partitioning, which will result in the saved data being split into separate folders when it is saved. Choose date
if you are working with data that has a very high time-resolution, e.g. updated every second, otherwise choose year
, as this will create few files and result in better performance.We can now query and resample this data using
load_dataframe
:df_weekly = fs.load_dataframe(
['tutorial/bitcoin.close'],
from_date='2020-01-01', to_date='2020-12-31',
freq='1W'
)
print(df_weekly.head())
Now create transform features to compute the exponentially-weighted moving averages of the bitcoin price over different time windows, along with a momentum indicator.
@fs.transform('tutorial/bitcoin.ewma.15', from_features=['tutorial/bitcoin.close'])
def ewma_15(df):
return df.ewm(halflife=15).mean()
@fs.transform('tutorial/bitcoin.ewma.30', from_features=['tutorial/bitcoin.close'])
def ewma_30(df):
return df.ewm(halflife=30).mean()
@fs.transform('tutorial/bitcoin.momentum', from_features=['tutorial/bitcoin.ewma.15', 'tutorial/bitcoin.ewma.30'])
def ewma_15(df):
return df['tutorial/bitcoin.ewma.15'] - df['tutorial/bitcoin.ewma.30']
These new, transformed features are now available to query from the
load_dataframe
method. They are calculated on-the-fly, and therefore reflect any changes to the underlying bitcoin price data.Last modified 2yr ago