File: config.yaml
Contains settings for the project's pipeline, container image builder and deployment.
The pipeline
related settings can be overridden during development (with kfc
command). Parameters can be overridden with kfc
's --set
flag and/or separate override file (e.g. kfc build_run --config-override override.yaml
).
The reasoning behind overrides is to allow easy experimentation, e.g. execute pipeline with different input parameters which point at smaller, development dataset.
Why yaml file:
-
Idea similar to configuration as a code,
-
Gives more flexibility by allowing to modify the config per each push to the (Pull Request) branch,
-
Keeps all most important preferences in single place,
-
Standardizes the setup across all of yours (ML model) repositories.
In it's basic form, your config.yaml
will look similar to:
repository:
owner: my-repo-username
name: my-model-repo-name
pipeline:
name: Sklearn Iris
description: Example pipeline to show how to configure your project
namespace: my-user-profile-name
experiment_name: My amazing experiment
pipeline_path: my_pipeline/pipeline.py
pipeline_args:
parameter_name: 'parameter_value'
deployment:
inference_service_name: sklearn-iris
inference_service_function_path: config_files/deployment.py
pre_deployment_test_sample_input_path: test_deployment/input.json
production:
namespace: prod
staging:
namespace: staging
image_builder:
container_registry_uri: my-registry
images:
- name: my_image_name
dockerfile_folder_path: containers/my_image_name
other_folders_path:
- containers/lib
Check explanation of each section (with additional, optional parameters) below.
Section repository
Repository related settings.
repository:
# The owner of your pipeline/model repository.
# E.g. if using Github, it is the OWNER part from https://github.com/OWNER/
owner: my-repo-username
# Repository name (REPOSITORY part of https://github.com/OWNER/REPOSITORY)
name: my-model-repo-name
Section pipeline
Kubeflow Pipeline related settings.
pipeline:
# Kubeflow pipeline name
name: Sklearn Iris
# Optional. Description for your pipeline
description: Example pipeline to show how to configure your project
# Kubernetes namespace (Kubeflow profile) in which pipeline will be executed.
# Note: It assumes you have access to this profile. Otherwise you won't be able to see results in Kubeflow UI.
# For details refer to https://www.kubeflow.org/docs/components/multi-tenancy/getting-started/
namespace: my-user-profile-name
# Kubeflow Experiment name this pipeline will be stored under.
experiment_name: My amazing experiment
# Path to file where your pipeline code is located.
# Note: Path is relative to root folder of your repository.
pipeline_path: my_pipeline/pipeline.py
# Optional. Name of the pipeline function.
# Required only if multiple pipeline functions were defined in the file.
pipeline_function_name: my_pipeline_func
# Optional. Allows to choose Kubeflow Pipelines execution mode.
# Valid selections are: V1_LEGACY, V2_COMPATIBLE, V2_ENGINE
# If not defined V2_COMPATIBLE is used
pipeline_execution_mode: V2_COMPATIBLE
# Pipeline run parameters.
# Optional if pipeline doesn't have any input parameters.
pipeline_args:
parameter_name: 'parameter_value'
Regarding pipeline_execution_mode
: Currently component in function kfops.model.materialize_model
only supports V2_COMPATIBLE
execution mode. More details on model materialization can be found
on Pipeline function page.
Section deployment
Deployment related settings
deployment:
# When model is deployed, it will use this inference service name.
inference_service_name: sklearn-iris
# Path to file where your inference service function has been defined.
# Note: Path is relative to root folder of your repository
inference_service_function_path: config_files/deployment.py
# Optional. Path to the location where inference test sample has been located.
# Currently only supports JSON file format.
# Sample will be used to check if newly deployed model responds with HTTP status 200.
# Tested model is deployed as canary with 0% traffic. In case of non-200 status, it will stop deployment process.
pre_deployment_test_sample_input_path: test_deployment/input.json
# Kubernetes namespace into which production models will be deployed to.
# Notice: namespace defined here has to already exist in cluster.
production:
namespace: prod
# Optional. Similar to production namespace above.
# Kubernetes namespace into which production models will be deployed to.
staging:
namespace: staging
Section image_builder
Deployment related settings
# Optional. Required only if your Kubeflow Pipelines use custom images and they are
# being built in the cluster as part of /build command
image_builder:
# Registry name where your built image will be pushed. Examples:
# * For Docker hub use your Docker hub profile name.
# * For ECR use name in format aws_account_id.dkr.ecr.region.amazonaws.com
container_registry_uri: my-registry
# Optional flag required during development. Allows to push image into in-cluster insecure registry.
insecure: true
# Specify images to be built during /build (or /build_run) step.
# Refer to section "Container images builder" in Readme for details.
images:
- name: my_image_name
dockerfile_folder_path: containers/my_image_name
other_folders_path:
- containers/lib
# Optional. If not defined, default MinIO preinstalled with
# Kubeflow is used as a build context for Kaniko.
# More details about Kaniko contexts: https://github.com/GoogleContainerTools/kaniko#kaniko-build-contexts
minio:
# Optional. By default context files are copied into MinIO bucket "image-build-artifacts"
# but can be overwritten using:
context_files_bucket_name: different-bucket-name
# Optional. By the default MinIO credentials are read from the cluster (minio-service.kubeflow.svc.cluster.local)
# You can override these settings with options below:
credentials:
endpoint: YOUR_ENDPOINT (e.g. my-minio-service.my-namespace.svc.cluster.local)
access_key: YOUR_ACCESS_KEY
secret_key: YOUR_SECRET_KEY