Discover functions built by the community. Run them instantly via API or clean UI.
Smartly detect and clean duplicates from your dataset (CSV or Excel). This function scans your data to find: 🔁 Exact duplicates — identical rows or repeated entries. 🤖 Fuzzy duplicates — similar rows with small differences (typos, spacing, casing, or minor text variations).
Smartly detect and clean duplicates from your dataset (CSV or Excel). This function scans your data to find: - 🔁 **Exact duplicates** — identical rows or repeated entries. - 🤖 **Fuzzy duplicates** — similar rows with small differences (typos, spacing, casing, or minor text variations). It automatically keeps the **first valid occurrence** of each duplicate and exports everything neatly organized in a single downloadable ZIP. 📦 Inside the ZIP you’ll get: 1. `deduplicated_<name>.csv` — your cleaned dataset (duplicates removed) 2. `duplicates_removed_<name>.csv` — all duplicate rows that were dropped 3. `fuzzy_pairs_<name>.csv` — pairs of rows that look alike (based on similarity) Args: file (FilePath): The uploaded CSV or Excel file to analyze. subset (str): Optional — comma-separated list of column names to check. If left empty, all columns are analyzed. similarity_threshold (int): Optional — how strict fuzzy matching should be (0–100). Higher = only very similar values are flagged. Default = 90 (good balance). Returns: str: Generated ZIP archive containing the cleaned dataset and detailed duplicate reports.
Generate a synthetic CSV file with random data and some NaN values. Args: rows (int): Number of rows to generate. nan_ratio (float): Approximate fraction of cells to replace with NaN (0–1). Returns: str: Path to the generated CSV file.
You can simply generate and download a testfile for other functions
Remove rows containing NaN values from a CSV or Excel file. The cleaned file is saved in the same format (CSV or XLSX). Args: file (FilePath): Input CSV or Excel file. Returns: str: Path to the cleaned file (same format as input).