The Harsh Reality: Why CMDB CI Duplication Refuses to Die—Even for the Best

CMDB administrators across ServiceNow environments continue to battle an obstinate problem: duplicate Configuration Items that persist despite multiple cleanup attempts. Even organizations with robust processes face recurring duplication issues that stem from systematic vulnerabilities in identification, discovery, and reconciliation workflows.

Weak identification rules create the foundation for most duplication problems. When the identification engine encounters multiple matches, it relies on configuration properties to determine behavior. If you have glide.identification_engine.skip_duplicates set to True with the default threshold of 5, the system updates the oldest CI and marks others with discovery_source=’Duplicate’. However, this creates a vicious cycle.

CIs marked as ‘Duplicate’ become invisible to the identification engine, meaning subsequent discoveries generate new duplicates rather than updating existing records.

Multiple data sources amplify duplication exponentially. Network teams run discovery tools while application teams load inventory data separately, creating multiple records for identical servers. Service graph connectors and golden dataset merges from different sources with matching identification values produce duplicates when onboarding lacks proper overlap assessment. Each unstructured data integration compounds the problem. Integrations that lack centralized orchestration often fail to enforce consistent identifiers, increasing duplication risk integration complexity.

Discovery process flaws perpetuate duplication even after initial cleanup. Running Discovery against CIs already marked with discovery_source=’Duplicate’ spawns new duplicates. Discovery patterns that fail to gather critical identifiers like serial numbers due to credential issues or logic errors prevent proper matching. Simultaneous discoveries of the same CI or multiple payloads processing dependent CIs create race conditions that generate duplicates.

Hidden CI records complicate remediation efforts markedly. Inclusion rules, query business rules, and ACLs can hide CIs from the identification engine, forcing it to create new records. Domain separation prevents the system from recognizing CIs as duplicates even when they contain identical data. Business rules that run slowly during payload processing cause dependent CIs to duplicate.

Manual remediation remains necessary despite automation features. You must clear discovery_source from ‘Duplicate’ and null the duplicate_of field before re-running identification. The CMDB Workspace De-duplication dashboard provides templates and bulk remedies, but complex cases require individual analysis to prevent updating wrong CIs or losing critical attributes. Background scripts can find duplicates by checking unique fields like MAC addresses, IP addresses, and asset tags. Non-unique placeholder values like N/A or None in identification attributes create matching failures that generate additional duplicates across datasets.