File: config.yaml

Contains settings for the project's pipeline, container image builder and deployment. 

The pipeline related settings can be overridden during development (with kfc command). Parameters can be overridden with kfc's --set flag and/or separate override file (e.g. kfc build_run --config-override override.yaml).

The reasoning behind overrides is to allow easy experimentation, e.g. execute pipeline with different input parameters which point at smaller, development dataset.

Why yaml file:

  • Idea similar to configuration as a code,

  • Gives more flexibility by allowing to modify the config per each push to the (Pull Request) branch,

  • Keeps all most important preferences in single place,

  • Standardizes the setup across all of yours (ML model) repositories.

In it's basic form, your config.yaml will look similar to:

repository:
  owner: my-repo-username
  name: my-model-repo-name
pipeline:
  name: Sklearn Iris
  description: Example pipeline to show how to configure your project
  namespace: my-user-profile-name
  experiment_name: My amazing experiment
  pipeline_path: my_pipeline/pipeline.py
  pipeline_args:
    parameter_name: 'parameter_value' 
deployment:
  inference_service_name: sklearn-iris
  inference_service_function_path: config_files/deployment.py
  pre_deployment_test_sample_input_path: test_deployment/input.json
  production: 
    namespace: prod
  staging:
    namespace: staging
image_builder:
  container_registry_uri: my-registry
  images:
    - name: my_image_name
      dockerfile_folder_path: containers/my_image_name
      other_folders_path:
        - containers/lib

Check explanation of each section (with additional, optional parameters) below.

Section repository

Repository related settings.

repository:

  # The owner of your pipeline/model repository. 
  # E.g. if using Github, it is the OWNER part from https://github.com/OWNER/
  owner: my-repo-username

  # Repository name (REPOSITORY part of https://github.com/OWNER/REPOSITORY)
  name: my-model-repo-name

Section pipeline

Kubeflow Pipeline related settings.

pipeline:

  # Kubeflow pipeline name
  name: Sklearn Iris

  # Optional. Description for your pipeline
  description: Example pipeline to show how to configure your project

  # Kubernetes namespace (Kubeflow profile) in which pipeline will be executed.
  # Note: It assumes you have access to this profile. Otherwise you won't be able to see results in Kubeflow UI.
  # For details refer to https://www.kubeflow.org/docs/components/multi-tenancy/getting-started/
  namespace: my-user-profile-name

  # Kubeflow Experiment name this pipeline will be stored under.
  experiment_name: My amazing experiment

  # Path to file where your pipeline code is located.
  # Note: Path is relative to root folder of your repository.
  pipeline_path: my_pipeline/pipeline.py

  # Optional. Name of the pipeline function.
  # Required only if multiple pipeline functions were defined in the file.
  pipeline_function_name: my_pipeline_func

  # Optional. Allows to choose Kubeflow Pipelines execution mode.
  # Valid selections are: V1_LEGACY, V2_COMPATIBLE, V2_ENGINE
  # If not defined V2_COMPATIBLE is used
  pipeline_execution_mode: V2_COMPATIBLE

  # Pipeline run parameters.
  # Optional if pipeline doesn't have any input parameters.
  pipeline_args:
    parameter_name: 'parameter_value' 

Regarding pipeline_execution_mode: Currently component in function kfops.model.materialize_model only supports V2_COMPATIBLE execution mode. More details on model materialization can be found on Pipeline function page.

Section deployment

Deployment related settings

deployment:

  # When model is deployed, it will use this inference service name.
  inference_service_name: sklearn-iris

  # Path to file where your inference service function has been defined.
  # Note: Path is relative to root folder of your repository
  inference_service_function_path: config_files/deployment.py

  # Optional. Path to the location where inference test sample has been located.
  # Currently only supports JSON file format.
  # Sample will be used to check if newly deployed model responds with HTTP status 200.
  # Tested model is deployed as canary with 0% traffic. In case of non-200 status, it will stop deployment process.
  pre_deployment_test_sample_input_path: test_deployment/input.json

  # Kubernetes namespace into which production models will be deployed to.
  # Notice: namespace defined here has to already exist in cluster.
  production: 
    namespace: prod

  # Optional. Similar to production namespace above.
  # Kubernetes namespace into which production models will be deployed to.
  staging:
    namespace: staging

Section image_builder

Deployment related settings

# Optional. Required only if your Kubeflow Pipelines use custom images and they are 
# being built in the cluster as part of /build command
image_builder:

  # Registry name where your built image will be pushed. Examples:
  # * For Docker hub use your Docker hub profile name.
  # * For ECR use name in format aws_account_id.dkr.ecr.region.amazonaws.com
  container_registry_uri: my-registry

  # Optional flag required during development. Allows to push image into in-cluster insecure registry. 
  insecure: true

  # Specify images to be built during /build (or /build_run) step.
  # Refer to section "Container images builder" in Readme for details.
  images:
    - name: my_image_name
      dockerfile_folder_path: containers/my_image_name
      other_folders_path:
        - containers/lib

  # Optional. If not defined, default MinIO preinstalled with 
  # Kubeflow is used as a build context for Kaniko.
  # More details about Kaniko contexts: https://github.com/GoogleContainerTools/kaniko#kaniko-build-contexts
  minio:

    # Optional. By default context files are copied into MinIO bucket "image-build-artifacts" 
    # but can be overwritten using:
    context_files_bucket_name: different-bucket-name

    # Optional. By the default MinIO credentials are read from the cluster (minio-service.kubeflow.svc.cluster.local)
    # You can override these settings with options below:
    credentials:
      endpoint: YOUR_ENDPOINT (e.g. my-minio-service.my-namespace.svc.cluster.local)
      access_key: YOUR_ACCESS_KEY
      secret_key: YOUR_SECRET_KEY