Atlantis Data Inspector: Complete Guide to Features & Setup
Overview
Atlantis Data Inspector (assumed name for a data-quality/inspection capability in Atlan-style platforms) provides continuous data profiling, rule-based validations, anomaly detection, and alerting across data platforms (e.g., Snowflake, Databricks, BigQuery). It centralizes quality metrics, links results to metadata and lineage, and supports automated remediation workflows.
Key features
- Connectors: Native integrations for Snowflake, Databricks, BigQuery, and common ingestion pipelines.
- Checks & Rules: Prebuilt and custom rule types (nulls, uniqueness, ranges, regex, referential integrity, SQL-based checks).
- Profiling & Metrics: Column/table profiling, historical trend metrics, and derived KPIs (completeness, accuracy, freshness).
- Anomaly detection: Automated baseline and drift detection to surface unexpected changes.
- Alerts & Notifications: Rule-triggered alerts with routing, escalation, and integrations to Slack/email/incident tools.
- Lineage & Context: Links failed checks to data lineage and business metadata to aid root-cause analysis.
- Auto-repair / Remediation: Playbooks for quarantine, reprocessing, or running corrective SQL (where supported).
- Governance & Auditing: Audit logs, role-based access controls, and compliance-focused reporting.
- Platform-specific optimizations: Use of Snowflake/Databricks native functions where available to minimize compute costs.
Setup — quick prescriptive steps (assumes Atlan-style environment)
-
Choose platform and authenticate
- Select your platform (Snowflake, Databricks, BigQuery).
- Create a service account/role with read (and optional write) permissions and configure credentials in the Inspector UI.
-
Register connections & datasets
- Add connections for catalogs/schemas.
- Register or scan target datasets/tables to populate metadata and enable profiling.
-
Define baseline profiling
- Run an initial profiling job to capture column statistics, null rates, distinct counts, and sample values.
-
Create quality rules
- Start with high-impact rules (completeness, uniqueness of keys, value ranges).
- Use templates for common checks; add SQL-based or regex checks for custom needs.
-
Set monitoring cadence
- Configure rule execution frequency (real-time, hourly, daily) based on data freshness requirements and cost tradeoffs.
-
Configure alerts & routing
- Define severity levels, notification channels (Slack, email, webhook), and escalation policies.
-
Link to lineage & owners
- Attach business owners and teams to datasets and rules; surface lineage so alerts include upstream impact.
-
Enable auto-re-attachment & schema drift handling
- Turn on auto-re-attachment where supported to preserve checks across column renames/DDL changes.
-
Test remediation workflows
- Implement and test playbooks (quarantine, rollback, rerun ETL, or corrective SQL) in a safe environment.
-
Monitor, iterate, and report
- Review dashboards, tune rule thresholds, schedule reports for stakeholders, and onboard additional datasets.
Best practices
- Start small: Begin with core business-critical datasets and a small set of high-value checks.
- Use metadata: Tie checks to business context and SLAs for meaningful prioritization.
- Optimize compute: Prefer native platform execution where supported to reduce cost.
- Automate remediation carefully: Start with alerts + manual approval before full automation.
- Governance: Maintain an audit trail for checks, failures, and remediation actions.
Troubleshooting common issues
- Failed checks after schema changes — enable auto-re-attachment or update rule bindings.
- Excessive compute/costs — reduce frequency, use sampling, or switch to platform-native execution.
- High false positives — refine thresholds, add contextual rules, or apply row-level filters.
Leave a Reply