Features
The Job Management module consists of multiple tightly integrated functional pages. Below, we provide a detailed introduction to the core positioning and configuration capabilities of each module, organized by the system's page navigation layout.
1. Dashboard
The entry-level digital cockpit for Job Management, designed to provide the operations team with a global snapshot of task execution for the day.
- Overview Statistics: Centrally displays the total number of targets, script library and Playbook inventory, total execution records (including success/failure/in-progress/pending/cancelled sub-items), total cron jobs and enabled count, and global average execution time (seconds).
- Execution Trends: Displays execution count, success count, failure count, cancellation count, and average execution time for the day by date; supports quick filters for 7/14/30 days, and also supports custom date range selection (maximum 90 days).
- Success Rate Comparison: Compares the success rate and average execution time between the current period and the previous equal-length period, supporting custom date range selection to evaluate job quality trends intuitively.
- Job Type Distribution: Aggregates execution count by job type in a grouped manner, intuitively presenting the usage proportion of script execution, file distribution, and other types.
- Execution Status Distribution: Groups statistics by execution status and supports the same custom date range filtering as trends.
- Recent Executions: Presents the latest job execution records in list form for quick identification of running or recently completed tasks, facilitating timely follow-up.
- High-Frequency Trigger Entries: Provides quick-access shortcut buttons to "Quick Execute" or "Create Cron Job," adapting to common usage habits for frequent invocations.
2. Target Management
The group management repository for target objects across various automated execution actions.
- Multi-Management Mode Configuration: Execution groups defined and managed here support both Agent mode (hosts with resident agents installed for scheduling) and Agentless mode (direct management via the Ansible protocol without resident agents).
- Flexible Target Mapping and Selection: Supports aggregating and associating scattered hosts from macro cluster perspectives or micro machine attribute levels through Tags or specific IP lists. This enables one-click target group referencing when subsequently triggering tasks such as script distribution.
3. Quick Execute
A high-speed, no-orchestration workspace built for ad-hoc short commands and troubleshooting task needs.
- Plug-and-Play Execution Panel: For temporary commands to be sent to hosts, it provides an IDE-like editable code panel. Users can directly write Shell, Python, and other common language script snippets and execute them on selected target resources.
- Remote Streaming Feedback: Enables millisecond-level interactive response from remote machines. The standard output (Stdout) from each target machine is printed in real-time on the page, greatly enhancing the experience in urgent interactive scenarios.
4. File Distribution
A centralized control hub for large-scale physical file transfer across multi-endpoint nodes.
- End-to-End Delivery Channel: Supports both pushing binary packages, configuration files, and other artifacts from the control center or a remote resource machine to designated paths on massive endpoints, and pulling remote business logs and files back to the source for aggregation.
- Target Blacklist/Whitelist Control: Protected by the underlying "High-Risk Path Interception System," file placement addresses are subject to blacklist and whitelist restrictions (strictly prohibiting overwrites to foundational paths such as
/etc,/boot). - Automatic Expiry of Temporary Files: Every file staged for distribution carries an expiration time (7 days by default, configurable within a range of 1 to 365 days). A scheduled task running daily at 00:00 automatically purges expired files. The platform offers no "permanent retention" option, preventing temporary distribution files from occupying storage indefinitely.
5. Resource Library
A standardized repository for centrally managing high-quality, replayable assets accumulated over long operational cycles.
- Script Library: Preserves commonly used environment repair and service detection scripts in reusable form. Provides code-level version management and online/offline workflow for scripts, preventing issues caused by unauthorized modifications by others. Scripts support Jinja2 parameter template syntax (
{{ param_name }}), allowing definition of parameter names, tags, encryption options, default values, and hint messages to enable one-script-multiple-scenario reuse. Shell-type scripts execute with thebashinterpreter by default; if the script's first line contains a valid Shebang (such as#!/bin/sh,#!/usr/bin/env python3), the system prioritizes the interpreter specified by the Shebang (supporting only whitelisted interpreters:sh,bash,python,python3,powershell,pwsh). - Playbook Library: For complex deployment tasks (such as installing a database with prerequisite disk mounting), it supports importing external Ansible standard YAML Playbooks. Playbook files are uploaded as ZIP archives; the system automatically parses the
playbook.ymlandREADME.mdwithin, displays parameter definitions and README content on the detail page, and records a Playbook version snapshot with each execution for version traceability. Based on declarative state management, it maintains long-running operational environment architectures that cannot be reliably achieved with a single line of code.
6. Cron Jobs
Unattended processing gears designed for highly repetitive cyclical operations tasks.
- Cron-Level Scheduling: Binds tasks solidified in the "Script Library" or "Quick Execute" with custom Cron expressions, enabling periodic operations like "clean up abandoned logs at 2 AM every night."
- Full Lifecycle Engine Management: Provides strong intervention capabilities for periodic rule-based tasks — whether task "suspension (halt)," "restart," or projecting "the next three predicted trigger times."
7. High-Risk Configuration & Interception (Security Config)
The "defense fuse" that guards the inviolable boundaries of automation, designed to prevent operations from being fully compromised and causing system-wide paralysis.
- High-Risk Command Configuration: Provides powerful regex capture policies that intercept users at the moment they submit dangerous commands (such as entering improper
rm -rf /or forced kernel start/stop commands), immediately blocking distribution and sending alerts. - High-Risk Path Configuration: Combined with "File Distribution" actions, it designates file operation forbidden zones, absolutely prohibiting blind writes to paths critically related to system survival.
- Team-Based Isolation (Horizontal Privilege Protection): Assets such as scripts, Playbooks, targets, and distribution files are authorized by team ownership. When a user references an asset to launch an execution, the platform verifies that the asset falls within the user's authorized teams, preventing a Team A user from referencing Team B's scripts, targets, or files for unauthorized execution.
Warning / Security Best Practices:
Never whitelist common high-risk commands under the pretext of testing convenience. Any compliant, frequently needed troubleshooting actions that require elevated privileges should first be encapsulated in the controlled "Script Library," rather than executed through scattered quick commands.
8. Execution History & Details
The global operations process archive, and the first place to look when troubleshooting — the black box.
- Global Execution Timeline: Records every execution serial number and overview, whether from quick triggers or scheduled dispatches. All execution details are stored in the database for on-demand review at any time.
- Deep-Dive Drill-Down: In the various branch views of "Job Details," you can precisely trace each machine's anomalous distribution messages and final standard exit output. This ensures that who executed what, when, and what error was reported is recorded as indisputable evidence.
- Re-execute: For jobs that have reached a terminal state, supports one-click creation of a new execution instance using the same parameters without needing to reconfigure.
- Real-Time Execution Output: The job details page uses SSE (Server-Sent Events) to stream standard output from each target machine in real-time, allowing you to continuously monitor progress and intermediate logs without waiting for task completion. Once the task enters a terminal state, the same interface automatically switches to display the historical result snapshot, providing a consistent browsing experience.
- Cancel Execution: Supports initiating cancellation for pending or in-progress tasks. Pending tasks transition immediately to a cancelled final state upon cancellation; in-progress tasks transition to a "cancelling" intermediate state and automatically converge to cancelled once the actual results are returned from the target machine (with system-level timeout fallback to force convergence); tasks already in a final state or already cancelling refuse duplicate cancellation requests. There are seven execution statuses total: final states are "success, failure, timeout, cancelled," and intermediate states are "pending, in-progress, cancelling."
- Trigger Source & Callback Traceability: Every execution record is tagged with its trigger source (manual / cron job / API call), an executor snapshot, and a snapshot of the Playbook version used at execution time, and it supports configuring a callback URL. Whether a job is triggered by a person or a third-party system, you can later reconstruct exactly who launched the execution, by what means, and with which version.
9. Open Integration
Job Management is not only usable standalone within the console — it also serves as a unified job-execution foundation that other applications on the platform (such as Patch Management) can call.
- NATS Open Interfaces: Exposes the following interfaces via the NATS message bus for direct invocation by third-party applications within the network (no additional Token required, trusting the NATS channel):
job_script_execute: Trigger script execution, returnstask_idjob_file_distribute: Trigger file distribution, returnstask_idjob_task_terminate: Cancel a running or queued job, maintaining consistent cancellation semantics and state transitions with the consolejob_status_batch_query: Batch query the execution status of multiple jobsjob_detail_query: Query the execution details of a single jobjob_target_list: Query the list of available target hosts for building thetarget_listparameter
- REST File Interfaces: Provides HTTP interfaces for third-party applications to upload and delete distribution files, authenticated via API Token (
Api-AuthorizationHeader):POST /api/v1/job_mgmt/api/open/upload_file: Upload a file, supports customexpire_days(1-365 days, defaults to 7 days)DELETE /api/v1/job_mgmt/api/open/delete_file: Delete an uploaded file
- Asynchronous Result Callbacks: Third-party callers can include a
callback_urlwhen launching a job. Once the job enters a terminal state, the platform proactively pushes the result back with a signature. On failure, the platform automatically retries with exponential backoff strategy (default initial interval ~5 seconds, maximum 120 seconds, with random jitter), up to 5 retries, ensuring reliable delivery.