Task I/O Targets¶
How is task data saved and loaded?¶
Task data is saved in a file, database table or memory (cache). You can control how task output data is saved by chosing the right parent class for a task. In the example below, data is saved as parquet and loaded as a pandas dataframe because the parent class is TaskPqPandas
. The python object you want to save determines how you can save the data.
class YourTask(d6tflow.tasks.TaskPqPandas):
Task Output Location¶
By default file-based task output is saved in data/
. You can customize where task output is saved.
d6tflow.set_dir('../data')
Core task targets (Pandas)¶
What kind of object you want to save determines which Task class you need to use.
- pandas
d6tflow.tasks.TaskPqPandas
: save to parquet, load as pandasd6tflow.tasks.TaskCachePandas
: save to memory, load as pandasd6tflow.tasks.TaskCSVPandas
: save to CSV, load as pandasd6tflow.tasks.TaskExcelPandas
: save to Excel, load as pandasd6tflow.tasks.TaskSQLPandas
: save to SQL, load as pandas (premium, see below)
- dicts
d6tflow.tasks.TaskJson
: save to JSON, load as python dictd6tflow.tasks.TaskPickle
: save to pickle, load as python list- NB: don’t save a dict of pandas dataframes as pickle, instead save as multiple outputs, see “save more than one output” in Tasks
- any python object (eg trained models)
d6tflow.tasks.TaskPickle
: save to pickle, load as python listd6tflow.tasks.TaskCache
: save to memory, load as python object
- dask, SQL, pyspark: premium features, see below
Community Targets¶
Writing Your Own Targets¶
This is often relatively simple since you mostly need to implement load() and save() functions. For more advanced cases you also have to implement exist() and invalidate() functions. Check the source code for details or raise an issue.