Saving/appending data

To add data to an existing feature, we use the fs.save_dataframe function. All data is appended, allowing you to add new data while preserving the existing timeseries values.

The code for this tutorial is available as a Colab notebook.

For example, continuing with the carbon intensity data from the previous tutorial we can write some code to fetch chunks of data and append them to the existing feature.

def fetch_carbon_intensity(from_date, to_date):
  response = requests.get(
    f'https://api.carbonintensity.org.uk/intensity/{from_date.strftime("%Y-%m-%dT%H:%MZ")}/{to_date.strftime("%Y-%m-%dT%H:%MZ")}',
  )
  response.raise_for_status()

  return pd.DataFrame(
      {
          'time': pd.to_datetime([row['from'] for row in response.json()['data']]).tz_localize(None),
          'value': [row['intensity'] for row in response.json()['data']]
      }
  )

 # List of time chunks for which we want to import data
 dts = pd.date_range('2020-01-01', '2021-03-01', freq='10D')
 for from_date, to_date in zip(dts[:-1], dts[1:]):
   # Query data for this month
   chunk = fetch_carbon_intensity(from_date, to_date)
   # Save it to the feature store
   fs.save_dataframe(chunk, 'tutorial/rawdata.carbon')

Now when we call fs.load_dataframe we can retrieve all of the values we have saved into the feature store as one consolidated dataframe.

all_data = fs.load_dataframe('tutorial/rawdata.carbon')

If you append timeseries values with timestamps that match existing records in the feature store, the new values will supersede the old data. Behind the scenes, ByteHub keeps both old and new versions of the data and these can be queried using the time travel feature. Detailed information on time travel will follow in another tutorial.

Last updated