kedro new --starter spaceflights.TypeError when converting dict inputs to a node made from a wrapped partial function.PartitionedDataSet improvements:
jalapeΓ±o will be accessible as DataCatalog.datasets.jalapeΓ±o rather than DataCatalog.datasets.jalape__o.kedro install for an Anaconda environment defined in environment.yml..kedro.yml to use kedro lint and kedro jupyter notebook convert.TensorFlowModelDataset in the HDF5 format with versioning enabled.run_result argument in after_pipeline_run Hooks spec.00-kedro-init.py file.Deepyaman Datta, Bhavya Merchant, Lovkush Agarwal, Varun Krishna S, Sebastian Bertoli, noklam, Daniel Petti, Waylon Walker
π Release 0.16.5
ParallelRunner on Windows.GBQTableDataSet to load customised results using customised queries from Google Big Query tables.Ajay Bisht, Vijay Sajjanar, Deepyaman Datta, Sebastian Bertoli, Shahil Mawjee, Louis Guitton, Emanuel Ferm
π Release 0.16.3
| Type | Description | Location |
|---|---|---|
pandas.AppendableExcelDataSet |
Works with Excel file opened in append mode |
kedro.extras.datasets.pandas |
tensorflow.TensorFlowModelDataset |
Works with TensorFlow models using TensorFlow 2.X |
kedro.extras.datasets.tensorflow |
holoviews.HoloviewsWriter |
Works with Holoviews objects (saves as image file) |
kedro.extras.datasets.holoviews |
kedro install will now compile project dependencies (by running kedro build-reqs behind the scenes) before the installation if the src/requirements.in file doesn't exist.only_nodes_with_namespace in Pipeline class to filter only nodes with a specified namespace.kedro pipeline delete command to help delete unwanted or unused pipelines (it won't remove references to the pipeline in your create_pipelines() code).kedro pipeline package command to help package up a modular pipeline. It will bundle up the pipeline source code, tests, and parameters configuration into a .whl file.DataCatalog:
DataCatalog.list() method.__ in DataCatalog.datasets, for ease of access to transcoded datasets.spark.SparkHiveDataSet.spark.SparkDataSet.pyarrow table in pandas.ParquetDataSet.kedro build-reqs CLI command:
kedro build-reqs is now called with -q option and will no longer print out compiled requirements to the console for security reasons.kedro build-reqs command are now passed to pip-compile call (e.g. kedro build-reqs --generate-hashes).kedro jupyter CLI command:
kedro jupyter notebook, kedro jupyter lab or kedro ipython with Jupyter/IPython dependencies not being installed.%run_viz line magic for showing kedro viz inside a Jupyter notebook. For the fix to be applied on existing Kedro project, please see the migration guide.pillow.ImageDataSet entry to the documentation.%run_viz line magic in existing project0οΈβ£ Even though this release ships a fix for project generated with kedro==0.16.2, after upgrading, you will still need to make a change in your existing project if it was generated with kedro>=0.16.0,<=0.16.1 for the fix to take effect. Specifically, please change the content of your project's IPython init script located at .ipython/profile_default/startup/00-kedro-init.py with the content of this file. You will also need kedro-viz>=3.3.1.
Miguel Rodriguez Gutierrez, Joel Schwarzmann, w0rdsm1th, Deepyaman Datta, Tam-Sanh Nguyen, Marcus Gawronsky
kedro.cli and kedro.context when running kedro jupyter notebook.catalog and context were not available in Jupyter Lab and Notebook.kedro build-reqs would fail if you didn't have your project dependencies installed.kedro catalog list to list datasets in your catalogkedro pipeline list to list pipelineskedro pipeline describe to describe a specific pipelinekedro pipeline create to create a modular pipelinegit-style.kedro.cli and kedro.context have been moved into kedro.framework.cli and kedro.framework.context respectively. kedro.cli and kedro.context will be removed in future releases.Hooks, which is a new mechanism for extending Kedro.load_context changing user's current working directory..kedro.yml.node(func, "params:a.b", None)| Type | Description | Location |
|---|---|---|
pillow.ImageDataSet |
Work with image files using Pillow |
kedro.extras.datasets.pillow |
geopandas.GeoJSONDataSet |
Work with geospatial data using GeoPandas |
kedro.extras.datasets.geopandas.GeoJSONDataSet |
api.APIDataSet |
Work with data from HTTP(S) API requests | kedro.extras.datasets.api.APIDataSet |
joblib backend support to pickle.PickleDataSet.MatplotlibWriter dataset.pip install "kedro[pandas.ParquetDataSet]".encoding or compression, for fsspec.spec.AbstractFileSystem.open() calls when loading/saving a dataset. See Example 3 under docs.namespace property on Node, related to the modular pipeline where the node belongs.SequentialRunner(is_async=True) and ParallelRunner(is_async=True) class.MemoryProfiler transformer.pandas>=1.0.pyspark is not fully-compatible with 3.8 yet.CONTRIBUTING.md - added Developer Workflow._exists method to MyOwnDataSet example in 04_user_guide/08_advanced_io.PartitionedDataSet and IncrementalDataSet were not working with s3a or s3n protocol.pandas.ParquetDataSet.functools.lru_cache with cachetools.cachedmethod in PartitionedDataSet and IncrementalDataSet for per-instance cache invalidation.SparkDataSet when running on Databricks.SparkDataSet not allowing for loading data from DBFS in a Windows machine using Databricks-connect.DataSetNotFoundError to suggest possible dataset names user meant to type.make test-no-spark.kedro lint --check-only).kedro.io.kedro.contrib and extras folders.CSVBlobDataSet and JSONBlobDataSet dataset types.invalidate_cache method on datasets private.get_last_load_version and get_last_save_version methods are no longer available on AbstractDataSet.get_last_load_version and get_last_save_version have been renamed to resolve_load_version and resolve_save_version on AbstractVersionedDataSet, the results of which are cached.release() method on datasets extending AbstractVersionedDataSet clears the cached load and save version. All custom datasets must call super()._release() inside _release().TextDataSet no longer has load_args and save_args. These can instead be specified under open_args_load or open_args_save in fs_args.PartitionedDataSet and IncrementalDataSet method invalidate_cache was made private: _invalidate_caches.KEDRO_ENV_VAR from kedro.context to speed up the CLI run time.Pipeline.name has been removed in favour of Pipeline.tag().Pipeline.transform() in favour of kedro.pipeline.modular_pipeline.pipeline() helper function.PARAMETER_KEYWORDS private, and moved it from kedro.pipeline.pipeline to kedro.pipeline.modular_pipeline.DataCatalog.β‘οΈ Since all the datasets (from kedro.io and kedro.contrib.io) were moved to kedro/extras/datasets you must update the type of all datasets in <project>/conf/base/catalog.yml file.
Here how it should be changed: type: <SomeDataSet> -> type: <subfolder of kedro/extras/datasets>.<SomeDataSet> (e.g. type: CSVDataSet -> type: pandas.CSVDataSet).
π In addition, all the specific datasets like CSVLocalDataSet, CSVS3DataSet etc. were deprecated. Instead, you must use generalized datasets like CSVDataSet.
E.g. type: CSVS3DataSet -> type: pandas.CSVDataSet.
Note: No changes required if you are using your custom dataset.
Pipeline.transform() has been dropped in favour of the pipeline() constructor. The following changes apply:
from kedro.pipeline import pipelineprefix argument has been renamed to namespacedatasets has been broken down into more granular arguments:
inputs: Independent inputs to the pipelineoutputs: Any output created in the pipeline, whether an intermediary dataset or a leaf outputparameters: params:... or parametersAs an example, code that used to look like this with the Pipeline.transform() constructor:
result = my\_pipeline.transform( datasets={"input": "new\_input", "output": "new\_output", "params:x": "params:y"}, prefix="pre")
When used with the new pipeline() constructor, becomes:
from kedro.pipeline import pipelineresult = pipeline( my\_pipeline, inputs={"input": "new\_input"}, outputs={"output": "new\_output"}, parameters={"params:x": "params:y"}, namespace="pre")
β‘οΈ Since some modules were moved to other locations you need to update import paths appropriately.
π You can find the list of moved files in the 0.15.6 release notes under the section titled Files with a new location.
β‘οΈ > Note: If you haven't made significant changes to your kedro_cli.py, it may be easier to simply copy the updated kedro_cli.py .ipython/profile_default/startup/00-kedro-init.py and from GitHub or a newly generated project into your old project.
KEDRO_ENV_VAR from kedro.context. To get your existing project template working, you'll need to remove all instances of KEDRO_ENV_VAR from your project template:
kedro_cli.py and .ipython/profile_default/startup/00-kedro-init.py: from kedro.context import KEDRO_ENV_VAR, load_context -> from kedro.framework.context import load_contextenvvar=KEDRO_ENV_VAR line from the click options in run, jupyter_notebook and jupyter_lab in kedro_cli.pyKEDRO_ENV_VAR with "KEDRO_ENV" in _build_jupyter_envcontext = load_context(path, env=os.getenv(KEDRO_ENV_VAR)) with context = load_context(path) in .ipython/profile_default/startup/00-kedro-init.pykedro build-reqsπ We have upgraded pip-tools which is used by kedro build-reqs to 5.x. This pip-tools version requires pip>=20.0. To upgrade pip, please refer to their documentation.
@foolsgold, Mani Sarkar, Priyanka Shanbhag, Luis Blanche, Deepyaman Datta, Antony Milne, Panos Psimatikas, Tam-Sanh Nguyen, Tomasz Kaczmarczyk, Kody Fischer, Waylon Walker
π Release 0.15.9
requirements.txt so pandas.CSVDataSet class works out of box with pip install kedro.pandas to our extra_requires in setup.py.DataSet class are missing.