Lesson 21 of 25

Data Integrity, Completeness Testing, and Root-Cause Analysis

5 min read · CAMS-Audit

Bad data silently breaks good controls. Learn completeness testing that reconciles source to platform, accuracy and lineage checks, and root-cause analysis that drives durable, not cosmetic, remediation.

Garbage in, garbage out

Every AML system runs on data feeds
Bad data silently breaks good controls
A complete-looking system can be blind to missing data
Data integrity underpins all other testing

Underneath every AML control is data, the customer records, the transaction feeds, the reference lists that monitoring and screening depend on. And here's the quiet killer: bad data breaks good controls invisibly. A perfectly tuned monitoring scenario can't catch what it never receives, so if a feed silently drops a category of transactions, the system shows green while sitting blind.

SR eleven dash seven flags data quality as part of model risk for exactly this reason. Data-integrity testing underpins everything else we've covered, because a clean-looking output built on incomplete or inaccurate input is a false comfort. Auditors who test only the logic and never the data miss this whole class of failure.

Completeness testing

Reconcile source systems to the AML platform
Did every transaction that should arrive, arrive?
Check for dropped feeds, filtered records, gaps
Completeness is about what's missing, not what's wrong

The first data test is completeness, and it asks a different question than accuracy. Completeness asks: did everything that should have reached the AML system actually reach it? You reconcile the source systems, the core banking platform, the payment rails, against what landed in the monitoring and screening tools.

Count records, sum amounts, and look for the gap. Did a feed silently drop overnight? Is a product's activity filtered out before it ever reaches monitoring?

Completeness is about what's missing, and missing data is the most dangerous kind, because nobody sees an alert that was never possible. This reconciliation, source to platform, is one of the highest-value tests an AML auditor performs.

Accuracy and lineage

Is the data that arrived correct and well-formed?
Trace fields through transformations (lineage)
Check for truncation, mismapping, default values
Accurate inputs are the basis for every conclusion

The second data test is accuracy, with its companion, lineage. Accuracy asks whether the data that did arrive is correct and well-formed, are amounts right, are country codes valid, are names intact rather than truncated? Lineage means tracing a field from its origin through every transformation to where the AML system consumes it, so you can see where it might get corrupted.

Common defects include truncation, where long names or account numbers get cut off, mismapping, where a field lands in the wrong column, and silent defaults, where a missing value is filled with a placeholder that fools the logic. Since every conclusion rests on these inputs, inaccurate data quietly undermines monitoring, screening, and reporting alike.

Root-cause analysis

Separate the symptom from the underlying cause
Ask 'why' until you reach the systemic driver
Fixing the symptom invites the finding to recur
Root cause drives durable, not cosmetic, remediation

Finding a problem isn't enough; the auditor must understand why it happened, and that's root-cause analysis. The symptom is what you observed, late SARs, missed alerts, a data gap. The root cause is the underlying driver.

A classic technique is to keep asking why: SARs were late, why; because the queue was backed up, why; because volume doubled and staffing didn't, why; because the risk assessment never flagged the growth that drove the volume. Now you've reached something systemic. This matters because fixing only the symptom, hiring temps to clear today's backlog, invites the finding to recur next quarter.

Root-cause analysis is what separates durable remediation from a cosmetic patch, and a recurring finding is often a sign the root cause was never addressed.

Using CAATs and analytics in fieldwork

Test the whole population, not just a sample
Re-perform monitoring logic against raw data
Surface outliers, duplicates, and impossible values
Powerful — but validate the analytics themselves

Data testing is where Computer-Assisted Audit Techniques really earn their keep, so let's connect them here. Instead of pulling a small sample, CAATs let you analyze the entire population: every customer missing a risk rating, every wire just below a reporting threshold, every alert closed in under sixty seconds. You can re-perform a monitoring rule's logic against the raw data and compare it to what the system actually produced, which directly tests whether the system is doing what it claims.

And you can surface anomalies a human sample would never catch, statistical outliers, duplicate records, impossible values like future-dated transactions or negative ages, each of which hints at a data-integrity problem. The caveat we raised earlier still holds, though: your analytics are only as trustworthy as the data feeding them and the logic you wrote, so validate the analytics themselves before trusting their output. A confident, automated conclusion drawn from flawed data or a buggy query is more dangerous than no analysis, because it carries an unearned air of authority.

Used carefully, CAATs turn data testing from a sample into a near-complete examination.

Recap and next

Bad data silently breaks good controls
Completeness — reconcile source to platform for what's missing
Accuracy and lineage — trace fields for corruption
Next module — writing findings and rating issues

Recapping: data integrity underpins every other control, because a system starved of complete, accurate data shows green while sitting blind. Completeness testing reconciles source systems to the AML platform to catch what's missing, accuracy and lineage testing trace fields for truncation and mismapping, and root-cause analysis drives past the symptom to the systemic driver so remediation actually sticks. That closes the heavy fieldwork module.

Next, we open the final module, reporting, recommendations, and follow-up, starting with how to write a finding and rate the issue by risk. Take the data and root-cause practice questions first.

Sources

FFIEC BSA/AML Examination Manual — transaction monitoring data and BSA reporting accuracy
SR 11-7 — data quality in model inputs
IIA International Professional Practices Framework — analytics and root-cause analysis

Test your knowledge

A few CAMS-Audit questions on this material — pick an answer to see the explanation.

Q1. A bank uses a machine-learning model to prioritize alert disposition. The model frequently deprioritizes alerts involving a specific ethnic community's remittance patterns. Which SR 11-7 dimension does this MOST specifically implicate?
Q2. An audit finds that the institution filed an initial SAR on a customer 18 months ago but has no record of any subsequent review or continuing-activity SAR filing despite ongoing high-volume cash transactions. What is the finding?
Q3. Using CAATs, an auditor analyzes the entire alert-disposition population and finds that 12% of alerts were closed in under 30 seconds with identical, one-sentence narratives. What does this pattern most likely indicate?
Q4. The institution's stated risk appetite allows for a maximum of 5% of alerts to exceed the 30-day disposition window. Monitoring data shows 18% of alerts are exceeding this window for the past six months, yet no escalation has occurred. What is the audit finding?

Ready to practice?

Put this lesson to work on real CAMS-Audit questions.

Drill the full CAMS-Audit bank →