CSV Deduper
Deduplicate a massive CSV file while preserving column order. This tool uses Python's built-in csv module and argparse for command-line interaction.
Usage
python csv_deduper.py -i input.csv -o output.csv
Requirements
- Python 3.8+
-
argparseandcsvmodules (part of Python's standard library)
import csv
import argparse
parser = argparse.ArgumentParser(description='Deduplicate a CSV file while preserving column order')
parser.add_argument('-i', '--input', required=True, help='Input CSV file path')
parser.add_argument('-o', '--output', required=True, help='Output CSV file path')
args = parser.parse_args()
seen = set()
with open(args.input, 'r') as input_file, open(args.output, 'w', newline='') as output_file:
reader = csv.reader(input_file)
writer = csv.writer(output_file)
for row in reader:
row_tuple = tuple(row)
if row_tuple not in seen:
seen.add(row_tuple)
writer.writerow(row)