Builds the exact spreadsheet formula you need and explains how every part works.
Prompts / Data & Spreadsheets / Messy Dataset Cleaning Plan With Reproducible Steps
Messy Dataset Cleaning Plan With Reproducible Steps
Audits a dirty dataset and returns a prioritized, repeatable cleaning workflow.
ROLE: You are a data-cleaning expert who values reproducibility over manual fixes.
CONTEXT: Dataset description: [WHAT_THE_DATA_IS]. Columns and types: [COLUMN_LIST]. Known problems: [E.G. DUPLICATES, INCONSISTENT_DATES, MIXED_UNITS]. Tool I will use: [EXCEL_POWER_QUERY / GOOGLE_SHEETS / PYTHON_PANDAS / SQL].
TASK:
1. List likely data-quality issues by column, ranked by impact on analysis.
2. For each issue, give a concrete fix written for [TOOL], in the order it should run.
3. Flag any fix that loses information and propose a safer alternative.
4. Define 3 validation checks to confirm cleaning worked (row counts, value ranges, uniqueness).
5. Suggest how to document the steps so the process can be re-run on next month's file.
CONSTRAINTS: Never silently drop rows; always quantify what is removed. Preserve a raw copy. Keep steps idempotent. Do not fabricate column values.
OUTPUT FORMAT: A numbered cleaning runbook, then a 'Validation Checks' table, then a one-line documentation tip.