Skip to main content

Capabilities Overview

OpsPilot's capability system consists of two core modules:

  • Multi-Type LLM Integration: Supports mainstream domestic and international LLM, Embed, Rerank, and OCR models, meeting AI scenario requirements for Q&A, retrieval, document understanding, and more.
  • Built-in Platform Tools: Out-of-the-box operations and general-purpose tools covering cluster management, CI/CD, inspection, office automation, and other scenarios, ready to use without additional configuration.

Models

OpsPilot supports integration with various mainstream large language models and industry models. This chapter introduces the different model types, capability characteristics, and applicable scenarios to help users flexibly select the right model based on business requirements.

LLM Models

HuggingFace Series

QwQ (Alibaba Cloud Tongyi Qianwen Reasoning Model)
  • Core Function: Enhances mathematical, programming, and logical reasoning capabilities through reinforcement learning, able to analyze problems step by step using "chain of thought."
  • Advantages:
    • Small yet powerful: With only 32 billion parameters (approximately 320 million "knowledge nodes"), its performance approaches the 670 billion parameter DeepSeek-R1 model.
    • Multi-domain proficiency: Excels at math problem solving, code generation, and complex logical reasoning (such as contract clause analysis).
  • Applications:
    • Mathematical reasoning: Solves complex mathematical problems, such as advanced algebraic equations, geometric proofs, etc.
    • Programming assistance: Helps write code, debug code, and optimize code logic.
    • General reasoning: Handles various problems requiring logical thinking, such as contract clause interpretation, logic puzzles, etc.

DeepSeek Series (High-Efficiency Domestic Models)

DeepSeek-R1:1.5b
  • Core Function: A lightweight large language model focused on the code domain, supporting Python, Java, and other programming languages.
  • Advantages:
    • Low-configuration operation: Can run on mobile phones or ordinary computers (8GB RAM), minimal resource consumption.
    • Fast code generation: Code completion, function generation, and error debugging 3-5x faster than manual work.
  • Applications:
    • Code generation: Quickly generates code in various programming languages such as Python, Java, C++, etc.
    • Code optimization: Optimizes existing code to improve runtime efficiency and readability.
    • Code debugging: Checks for syntax errors and logic errors in code and provides correction suggestions.

OpenAI Series (World-Leading General Models)

1. GPT-3.5-Turbo-16K
  • Core Function: A general-purpose large language model with extended context processing capability, handling long texts (12,000 characters) with high cost-effectiveness.
  • Advantages:
    • Multi-task adaptability: Capable of writing articles, translation, data analysis, customer service Q&A, and more.
    • Cost-effective: Achieves a good balance between performance and cost, with lower usage costs compared to GPT-4.
  • Applications:
    • Content creation platforms: Creators can use it to quickly generate article outlines and first drafts.
    • Enterprise document processing: Used for content summarization and key information extraction from contracts, reports, etc.
2. GPT-4-32K
  • Core Function: A top-tier large language model with ultra-long context processing capability and powerful multimodal abilities. Handles ultra-long texts (24,000 characters) and supports multimodal input (text + images).
  • Advantages:
    • Powerful comprehensive capabilities: Excels in language understanding, reasoning, creative generation.
    • Multimodal fusion: Enables text-image fusion interaction and expands application scenarios.
  • Applications:
    • Ultra-long text analysis: Processes text up to 32K tokens for deep analysis and understanding.
    • Multimodal interaction: Supports text and image input/output.
    • Complex problem solving: Solves advanced mathematical and specialized technical problems.
3. GPT-4o (Multimodal All-in-One Model)
  • Core Function: A flagship multimodal model supporting mixed input/output of text, audio, images, and video, focused on "all-in-one interaction" capabilities.
  • Advantages:
    • Speed & cost innovation: 200% faster processing speed, 50% lower price, 5x higher rate limits.
    • Multimodal capability breakthrough: Precisely identifies image and video details, supports direct voice interaction with emotion recognition.
    • Enhanced reasoning: Sets new high scores in MMLU evaluations.
  • Applications:
    • Real-time multimodal reasoning: Simultaneously performs real-time reasoning on audio, visual, and text data.
    • Cross-language translation: Handles 50 different languages.
    • Complex task handling: Strong text, reasoning, and coding capabilities.

LLM Model Summary

In addition to the models described above, OpsPilot currently supports the following full list of LLMs:

CategoryModel IconModel IDModel NameModel Type
BaichuanBaichuanbaichuan-2Baichuan 2Text
BaichuanBaichuanbaichuan-2-chatBaichuan 2 ChatText
CodeLlamaMetaAIcode-llamaCodeLlamaCode
CodeLlamaMetaAIcode-llama-instructCodeLlama InstructCode
CodeLlamaMetaAIcode-llama-pythonCodeLlama PythonCode
CodeGeeXCodeGeeXcodegeex4CodeGeeX4Code
CodeQwenAlibabacodeqwen1.5CodeQwen1.5Code
CodeQwenAlibabacodeqwen1.5-chatCodeQwen1.5 ChatCode
CodeShellcodeshellCodeShellCode
CodeShellcodeshell-chatCodeShell ChatCode
CodestralMistralcodestral-v0.1Codestral v0.1Code
CogAgentZhipucogagentCogAgentReasoning Enhanced
DeepSeekDeepSeekdeepseekDeepSeekReasoning Enhanced
DeepSeekDeepSeekdeepseek-chatDeepSeek ChatReasoning Enhanced
DeepSeekDeepSeekdeepseek-coderDeepSeek CoderCode
DeepSeekDeepSeekdeepseek-coder-instructDeepSeek Coder InstructCode
DeepSeekDeepSeekdeepseek-prover-v2DeepSeek Prover v2Reasoning Enhanced
DeepSeekDeepSeekdeepseek-r1DeepSeek R1Reasoning Enhanced
DeepSeekDeepSeekdeepseek-r1-0528DeepSeek R1 0528Reasoning Enhanced
DeepSeekDeepSeekdeepseek-r1-0528-qwen3DeepSeek R1 0528 Qwen3Reasoning Enhanced
DeepSeekDeepSeekdeepseek-r1-distill-llamaDeepSeek R1 Distill LlamaReasoning Enhanced
DeepSeekDeepSeekdeepseek-r1-distill-qwenDeepSeek R1 Distill QwenReasoning Enhanced
DeepSeekDeepSeekdeepseek-v2-chatDeepSeek V2 ChatReasoning Enhanced
DeepSeekDeepSeekdeepseek-v2-chat-0628DeepSeek V2 Chat 0628Reasoning Enhanced
DeepSeekDeepSeekdeepseek-v2.5DeepSeek V2.5Reasoning Enhanced
DeepSeekDeepSeekdeepseek-v3DeepSeek V3Reasoning Enhanced
DeepSeekDeepSeekdeepseek-v3-0324DeepSeek V3 0324Reasoning Enhanced
DeepSeekDeepSeekdeepseek-vl2DeepSeek VL2Multimodal
DianJinDianJin-R1DianJin R1Reasoning Enhanced
ERNIEBaiduErnie4.5ERNIE 4.5Text
FinLLMfin-r1Fin R1Reasoning Enhanced
GemmaGemmagemma-3-1b-itGemma 3 1B ITText
GemmaGemmagemma-3-itGemma 3 ITText
GLM (Zhipu)ChatGLMglm-4.1v-thinkingGLM 4.1V ThinkingReasoning Enhanced
GLM (Zhipu)ChatGLMglm-4.5GLM 4.5Text
GLM (Zhipu)ChatGLMglm-4vGLM 4VText
GLM (Zhipu)ChatGLMglm-edge-chatGLM Edge ChatText
GLM (Zhipu)ChatGLMglm4-0414GLM4 0414Text
GLM (Zhipu)ChatGLMglm4-chatGLM4 ChatText
GLM (Zhipu)ChatGLMglm4-chat-1mGLM4 Chat 1MText
Gorillagorilla-openfunctions-v2Gorilla OpenFunctions v2Reasoning Enhanced
OpenAIGPTgpt-2GPT-2Text
LLaMAMetallama-2LLaMA 2Text
LLaMAMetallama-3LLaMA 3Text
LLaMAMetallama-3.1LLaMA 3.1Text
LLaMAMetallama-3.2-visionLLaMA 3.2 VisionMultimodal
LLaMAMetallama-3.3-instructLLaMA 3.3 InstructText
MistralMistralmistral-instruct-v0.3Mistral Instruct v0.3Text
MistralMistralmistral-large-instructMistral Large InstructText
MixtralMistralmixtral-8x22B-instruct-v0.1Mixtral 8x22B Instruct v0.1Text
QwenQwenqwen2.5-instructQwen2.5 InstructText
QwenQwenqwen2.5-coderQwen2.5 CoderCode
QwenQwenqwen3Qwen3Text
QwenQwenQwen3-CoderQwen3 CoderCode
QwenQwenQwen3-ThinkingQwen3 ThinkingReasoning Enhanced
YiYiYi-1.5-chatYi 1.5 ChatText

For the complete model list, please refer to the Chinese documentation.

Embed Models

FastEmbed (bge-small-zh-v1.5)

A lightweight Chinese semantic embedding model focused on converting Chinese text into computer-understandable "digital encodings," efficiently capturing semantic similarity.

  • Lightweight & Fast: Only 95MB in size, runs on ordinary computers with millisecond-level response.
  • Chinese Optimized: Deeply optimized for Chinese semantics, accurately recognizing idioms, internet slang, and professional terminology.
  • Low-Cost & Easy to Use: Open-source and free, requires no complex configuration.

BCEmbedding (bce-embedding-base_v1)

A Chinese-English bilingual semantic bridge model that supports bidirectional semantic conversion between Chinese and English text.

  • Bilingual Precise Alignment: Based on Youdao translation technology, cross-language retrieval accuracy improved by 40%.
  • Long Text Processing: Supports overall semantic encoding of documents with tens of thousands of characters.
  • Two-Stage Retrieval: First quickly filters semantically related texts, then precisely ranks results.

Rerank Models

BCEReranker (bce-reranker-base_v1)

A cross-encoder-based Chinese-English-Japanese-Korean multilingual reranking model focused on fine-grained reranking of initial retrieval results.

  • Multilingual Cross-Domain Adaptation: Supports Chinese, English, Japanese, and Korean.
  • Long Text Precise Ranking: Breaks through the traditional 512-token input limit.
  • Absolute Score Filtering: Outputs quantifiable "absolute relevance scores" (recommended threshold 0.35-0.4).
  • RAG Deep Optimization: Achieves SOTA performance in LlamaIndex evaluations when combined with BCEmbedding.

OCR Models

OlmOCR

A document parsing tool based on large language models, focused on converting PDFs and images into editable text with support for tables, formulas, and handwritten content.

AzureOCR

Microsoft Azure cloud OCR tool supporting multilingual text recognition with Form Recognizer integration for extracting structured data from documents.

PaddleOCR

An open-source OCR toolkit by Baidu providing end-to-end text detection, recognition, and direction classification supporting 80+ languages.

ModelAdvantagesApplications
OlmOCRMultimodal fusion, complex layout parsing, structured outputAcademic documents, handwritten archives, PDF digitization
AzureOCRCloud service integration, 50+ languages, security complianceEnterprise forms, contract parsing, real-time video text recognition
PaddleOCRLightweight open-source, 80+ languages (Chinese optimized), edge-cloud deploymentMobile recognition, batch document processing, cross-language scenarios

Tools

Operations Tools

Kubernetes Tools

A suite of operations and management tools for Kubernetes clusters, covering cluster resource queries, status monitoring, fault diagnosis, and configuration management.

  • CI/CD Pipeline Full-Chain Management: Task visual management and automated triggering with parameterized builds.
  • Team Collaboration & Process Auditing: Build task tracing and multi-tool integration support.
  • Fault Recovery & Emergency Handling: Batch task operations and canary deployment support.

Jenkins

A collection of functions for interacting with the Jenkins CI/CD platform, providing the ability to query and operate build tasks on Jenkins servers.

  • Fine-Grained Cluster Resource Management: Multi-dimensional resource queries and resource optimization & cleanup.
  • Full-Chain Fault Diagnosis & Monitoring: Quick abnormal resource identification and log & configuration linked analysis.
  • Automated Operations & Standardization: Batch operation support and configuration standardization management.
  • Performance & Stability Assurance: Node health monitoring and container runtime analysis.

General Tools

General tools

get_current_time
  • Function: Retrieves the current system time in real-time (accurate to seconds/milliseconds).
  • Use Cases: Logging, task scheduling, and report generation.

Search Tools

  • Function: Initiates web queries through the DuckDuckGo search engine, returning structured search results.
  • Use Cases: Information retrieval and data collection.