Skip to main content

Quick Start

This guide will walk you through the complete core workflow from scratch in the MLOps module, using basic algorithm scenarios like "Anomaly Detection" as an example: "Prepare Dataset → Start Model Training → Publish Service → Online Inference."

Prerequisites

  • You have a BK-Lite access account with MLOps resource management permissions granted. Your account has been assigned to at least one organization (team).
  • You have prepared a test dataset suitable for the chosen algorithm scenario (e.g., a structured CSV file with metric-formatted data that matches the selected feature characteristics).
  • The deployment side has completed MLOps algorithm configuration initialization (by running the init_algorithm_config management command); otherwise, "New Training Task" will have no algorithms to select.

Step-by-Step Guide

1. Mount the Data Foundation (Dataset Management)

The quality of a model heavily depends on the quality of its "feeding."

  1. Log into the system, enter MLOps, and select your target algorithm scenario from the "Scenario Dropdown" at the top-left of the platform (e.g., Anomaly Detection or Image Classification).
  2. Navigate to the "Dataset" menu in the left sidebar, click "New Dataset," enter the basic information, select your organization, and confirm to create the dataset.
  3. Click on the newly created dataset details page, batch upload your raw sample files, and label each sample with its intended purpose: Training Data, Validation Data, or Test Data (these three can be combined and are not mutually exclusive).
  4. After sample allocation is complete, click "Publish Version" to create a baseline snapshot for this time slice (e.g., V1). When the version status transitions to "Published," it becomes available for training task mounting. To temporarily take an older version offline, you can archive it; to restore, use "Restore Archive."

2. Schedule and Generate the Model (Training Task Management)

Assemble the data baseline you just created with the system's underlying algorithm to let the machine automatically learn patterns.

  1. Return to the left menu, enter the "Training Tasks" area, and click "New Task."
  2. In the task configuration dialog, you only need to mount two key items: (1) the "Dataset Published Version" you created in Step 1; (2) the algorithm (from the algorithm configuration list for the current scenario) and its hyperparameter form.
  3. After saving, the task will be in "Pending Training" status, and hyperparameter configuration automatically syncs to MinIO to generate the training config file. Click the "Start Training" button, and the platform launches the training container via webhookd, with Celery polling MLflow every 30 seconds to track progress.
  4. During actual training (status: "Training"), you can enter the task details to view the Run History List, click a specific run to see its Metric History Curves and Runtime Parameters panel. If current training results are unsatisfactory, click "Stop Training" to interrupt the current run (task status returns to pending), modify configuration, and restart.
  5. After training completes, the task status changes to "Completed." In the run history, click "Download Model" to package the run's artifacts as ZIP and download locally (for offline analysis or external deployment).

3. Build a Consumable Capability (Capability Publishing)

A model's business value can only be realized once it becomes an online API or test endpoint.

  1. After the training task status changes to "Completed," click "Capability Publishing" in the left sidebar and select "New Service."
  2. In the dialog, specify the training task you just completed and its "Model Version Number" (latest means using the latest registered version, or enter a specific version like 1, 2). You can optionally specify "Service Port"; leave blank for docker auto-assignment.
  3. After saving, the system automatically calls webhookd to launch the inference container. In normal cases, the service is created and started immediately; no additional action needed. To switch model versions or port configuration, update the service record; if the container is running, the platform automatically restarts it to apply changes.
  4. To temporarily release resources, click "Stop" to delete the container (service record retained); to completely remove, click "Remove" to force-delete the running container; to delete the record entirely, click "Delete" (the platform cleans up the container first).

Result Verification and Closure

After successfully launching the container (i.e., the service status displays Active/Started), do not leave the page just yet:

  1. Find your deployed service in the Capability Publishing section and click the "Online Inference" button.
  2. In the visual interactive window that appears, submit a piece of completely new data or a local test image that was never fed to the model, and click submit.
  3. Observe the model output results in the Inference Response Area below, and evaluate whether the identification and labeling are accurate.
  4. Business closure recommendation: Once the inference quality meets your acceptance criteria, your related client systems or internal operations modules can integrate with the unified API published by this service for fully automated high-frequency identification calls. If the results are unsatisfactory, return to the "Dataset Management" step, supplement your special case data, republish as V2, and conduct a new round of iterative training.