When Nobody Knows a Config File Changed, Incident Review Loses a Critical Piece of Evidence
Release owner Xiao Zhou is the one who gets stuck in the review meeting.
The interface timeout happens more than ten minutes after release. There is a monitoring curve, there are error logs, and the dependency path from the application to the database can also be found. All the materials seem to point in the same direction: unstable connections.
Then the business interface owner asks one question: "Was the connection pool configuration just adjusted before the incident?"
The room goes quiet for a few seconds.
Some people look through release records. Some scroll through group chat messages. Some log into the machine to inspect the current file. But the current file can only prove what it looks like now, not what it looked like then. What really blocks the review is not that nobody checked the logs. It is that nobody can produce the config file versions and diffs from before and after the incident.
The most dangerous thing in a review is not too few clues. It is when the clues suddenly break at the configuration layer.
The Root Cause: Runtime State Leaves No Trace
For many teams, the first impression of CMDB is an asset ledger.
Which organization owns a host, which application a database is related to, and which chain a middleware instance belongs to. This information matters because it helps teams answer who the object is and how objects are connected.
But that is not what Xiao Zhou is missing right now. He already knows which database the abnormal interface depends on, and he already knows the time window in which the failed requests occurred.
What he lacks is a different kind of evidence: how this object was actually configured when the incident happened.
Config files sit in a place that is easy to overlook. They do not naturally appear on large dashboards like monitoring metrics do, and they are not searched immediately during incidents like error logs are. Yet details such as connection addresses, timeout values, cache switches, and startup parameters often directly shape runtime behavior.
Configuration changes are not necessarily the root cause. But if they have no version, no diff, and no operational context, the review can only fall back to verbal confirmation.
Layer One: The Current File Cannot Represent the State at the Time
After the review meeting, Xiao Zhou first logs into the machine to inspect the config file.
The file is still there, and its contents can be opened. But this step quickly runs into the first problem: the current file can only show what may be loaded now. It cannot show whether a change happened before or after the incident.
If someone temporarily changed the configuration after the incident to restore service, the current file may no longer reflect the incident scene at all.
This is the most common breakpoint in config-file reviews: being able to see the file does not mean being able to see the version.
BK Lite CMDB puts config files into asset details for viewing. For supported models such as hosts, middleware, and databases, it shows config file paths and version history. That makes the investigation order more stable:
- Locate the specific asset first
- Check which config files belong to that asset
- Confirm whether a new config version appeared before or after the incident
This step does not make the root-cause judgment for the team. It first turns "was it really like this at the time" from a memory question into an evidence question.
Layer Two: Having Versions Is Not Enough Without Diffs
Xiao Zhou finds two versions, but the review is still not done.
Config-file changes are often not large blocks of edits. They can be as small as a single easy-to-miss line. A connection address may point to a different domain. A timeout value may change from one number to another. A cache switch may be turned off. A startup parameter may lose one segment. Any of these can change runtime behavior.
If the team can only open the two versions separately and compare them by hand, the review still moves slowly.
BK Lite CMDB supports viewing config file version content and also comparing diffs between two versions. The value of diff comparison is not that it tells you "this line is the root cause." It compresses the problem from a full file into several concrete change points.
At that point, the review discussion changes.
It no longer sounds like "does anyone remember whether it was changed yesterday?" It becomes "there is indeed one additional version in this time window, and the diff is concentrated in connection-pool parameters and the target address. Does that line up with the interface timeout?"
The more concrete the evidence is, the less judgment depends on instinct.
Layer Three: File Diffs Still Need Operational Context
After seeing the diff, Xiao Zhou runs into the next question.
Who made this change, and when? Was it a routine adjustment, an incident recovery action, or a record update produced by automated collection? Were there other changes to the asset instance in the same time window?
The file content alone cannot answer these questions.
BK Lite CMDB asset details can display instance change records and support filtering by change type, change scenario, operator, and time range. CMDB operation logs record model- and asset-related operations and retain operation type, change scenario, operator, and before-and-after data snapshots.
These two sources fill in the context.
| Review Question | Evidence Needed |
|---|---|
| Was there a config change before or after the incident? | Config file version history |
| What exactly changed? | Diff comparison between two versions |
| In what time window did the change happen? | Instance change records |
| Who performed related operations? | Operators and snapshots in operation logs |
At this point, Xiao Zhou can finally move the config file from a "suspected clue" back into a traceable chain.
This chain diagram is not trying to attribute the incident directly to a configuration change. It shows that once a review enters the configuration layer, evidence splits into two paths: with versions and diffs, the discussion can continue downward; with only the current file, judgment easily stops at memory and verbal confirmation.
Layer Four: Critical Changes Should Not Be Discovered Only After the Fact
Reviews can supplement evidence after the fact, but critical configuration changes should ideally not wait until the review to be seen for the first time.
Some host configurations naturally sit in high-risk positions, such as database connections, gateway forwarding, collector parameters, and startup items for core middleware. They may not change every day, but once they do, the relevant people should see them quickly.
BK Lite CMDB data subscriptions support triggering on config-file changes, but there is a clear boundary here: config-file change triggering only applies to the host model and is judged from newly collected successful config-file versions within the detection window; notifications are deduplicated by instance so later versions from the same instance do not repeatedly flood the channel.
This capability is suitable for watching critical host configurations, not for pushing notifications for every model and every config.
What really needs design is the boundary of the rule: which instances are worth subscribing to, who should receive notifications, whether it should be enabled, and whether it should be managed by organization and responsibility scope. If the boundary is not designed well, awareness of config changes turns into noise.
Technical Insight: Configuration Is Not an Attachment but Runtime Evidence
The value of config files in incident review can be split into three layers:
- Reviewable: return to the specific asset and see config file paths and version history
- Comparable: compare two versions and confirm where the changes are concentrated
- Traceable: combine instance change records and operation logs to reconstruct the change context
If any one of these three layers is missing, the review distorts.
With only the current file and no version history, the team cannot confirm the state at the time.
With versions but no diffs, the team still has to manually scan the full file.
With diffs but no operational context, the team struggles to judge which scenario the change belongs to.
Put the Evidence Chain Back Under the Asset View
An ideal review path is not for everyone to hunt for evidence across multiple systems and chat histories.
It should first return to the asset object:
- Confirm the abnormal object and its upstream and downstream dependencies through asset relationships.
- Check config file paths and version history in asset details.
- Compare config file version diffs before and after the incident.
- Combine instance change records and operation logs to confirm the operational context.
- Create subscriptions for critical host configurations so high-value changes enter view earlier.
BK Lite CMDB config files, change records, operation logs, and data subscriptions each catch a different breakpoint on this chain.
It does not make the root-cause judgment for the team, nor does it promise to automatically interpret config changes as the cause of an incident. Its more important role is to keep config files from being scattered across machines, scripts, and group chats, and instead bring them back under asset details as runtime evidence that can be viewed, compared, and traced.
In the end, Xiao Zhou does not write "the configuration change is the root cause" in the review conclusion.
He writes a steadier sentence instead: within the incident time window, the target asset produced a new config version, and the diff was concentrated in connection parameters; the change has been cross-checked against the corresponding change records and operation logs, and critical host configurations will be included in subscription scope going forward.
That is the role config files should really play in incident review.
They do not make the judgment for people, but they give judgment evidence.