Comment on page
Using Cloud Storage
The simplest way to use ByteHub is to store feature data. However cloud storage provides like AWS S3, Azure blob, GCP cloud storage are also supported allowing large datasets to be stored and shared.
ByteHub uses Dask to store data. To use a cloud storage service you will need to create a namespace with
url
and storage_options
configured for your provide. For example using AWS S3: fs.create_namespace(
's3-demo',
url='s3://my-bucket-name/demo',
description='S3 tutorial',
storage_options={
'key': aws_access_key_id, 'secret': aws_secret_access_key, 'use_ssl': True
}
)
The Dask remote storage documentation details the configuration required for different cloud providers. In summary:
Cloud | URL format | Storage options |
AWS | 's3://{bucket_name}/{folder_name}' | key , secret |
Azure | 'abfs://{container_name}/{folder_name}' | account_name ,account_key |
GCP | 'gcs://{bucket_name}/{folder_name}' | token |
Last modified 2yr ago