🛠️ csv_deduper: Deduplicate CSV files while preserving column order

python dev.to

CSV Deduper

Deduplicate a massive CSV file while preserving column order. This tool uses Python's built-in csv module and argparse for command-line interaction.

Usage

python csv_deduper.py -i input.csv -o output.csv
Enter fullscreen mode Exit fullscreen mode

Requirements

  • Python 3.8+
  • argparse and csv modules (part of Python's standard library)
import csv
import argparse

parser = argparse.ArgumentParser(description='Deduplicate a CSV file while preserving column order')
parser.add_argument('-i', '--input', required=True, help='Input CSV file path')
parser.add_argument('-o', '--output', required=True, help='Output CSV file path')
args = parser.parse_args()

seen = set()
with open(args.input, 'r') as input_file, open(args.output, 'w', newline='') as output_file:
    reader = csv.reader(input_file)
    writer = csv.writer(output_file)
    for row in reader:
        row_tuple = tuple(row)
        if row_tuple not in seen:
            seen.add(row_tuple)
            writer.writerow(row)

Enter fullscreen mode Exit fullscreen mode

Source: dev.to

arrow_back Back to Tutorials