🐸

The Signup Cleaner

The Data Tinkererpython-data-tinkerer-38-the-signup-cleaner
Reward: 160 XP
|

The Signup Cleaner

The event is close now, and the people handling check-in need a list they can use right away. But the signup sheet in front of you is not ready yet: some people added extra spaces, some submitted twice, and some rows are missing key details. If you hand it over like this, the front-desk team will end up guessing which entries are actually usable.

So your job is clear: clean this small CSV list and hand back something the check-in team and the people doing the follow-up counts can keep using. You will clean fields, skip rows with blank required data, keep the first signup for each email, and rebuild a list that is ready for handoff.

First, warm up by cleaning one signup row

Do not jump into the whole sheet yet. Start with one toy example so you can feel the smallest move first: split one messy signup row, then turn it into stable fields.

row = "  Ava Stone  , AVA@example.com  , vip "
full_name, email, ticket_type = row.split(",")

clean_full_name = full_name.strip()
clean_email = email.strip().lower()
clean_ticket_type = ticket_type.strip().lower()

print(clean_full_name)
print(clean_email)
print(clean_ticket_type)

This still does not handle the whole list: it does not skip blank rows and it does not deduplicate by email. It only demonstrates today’s first key move—split one row, then clean the fields you actually need to hand off.

Today’s deliverable: turn signup_sheet.csv into a check-in-ready list

The script already reads signup_sheet.csv, stores the header in header_line, and keeps the real data rows in dirty_rows. You need to finish the script so it produces two outputs: one list that other people can keep using, and one summary that explains the cleanup.

1
Finish the single-row cleaning function first

Complete clean_signup_row(row). Use row.split(",") to unpack full_name, email, and ticket_type. Then use strip() to clean the field edges, and lower the email and ticket_type.

2
Skip rows with blank required fields

If the cleaned full_name, email, or ticket_type is still blank, make the function return None. The main loop will count that row in skipped_blank_count.

3
Deduplicate by cleaned email and keep the first signup

Use the set seen_emails to remember which cleaned emails were already kept. If the cleaned email already appeared, count it in duplicate_count and do not keep it again.

4
Output the usable list and the summary

Put each kept dict into usable_signups, then rebuild a handoff-ready CSV row and store it in usable_rows. Finally, build signup_summary to report the total rows, kept rows, blank skips, and duplicates.

The handoff standard is simple

This list does not need to be fancy. It just needs to be easy for the next person to use: clean fields, no blank records mixed in, and no duplicate emails counted twice.

Suggested Solution
Expand
Solution:
with open("signup_sheet.csv", "r", encoding="utf-8") as file:
  lines = file.read().splitlines()

header_line = lines[0]
dirty_rows = lines[1:]

print("Header:", header_line)
print("Dirty rows:", dirty_rows)


def clean_signup_row(row):
  full_name, email, ticket_type = row.split(",")
  clean_full_name = full_name.strip()
  clean_email = email.strip().lower()
  clean_ticket_type = ticket_type.strip().lower()

  if clean_full_name == "" or clean_email == "" or clean_ticket_type == "":
      return None

  return {
      "full_name": clean_full_name,
      "email": clean_email,
      "ticket_type": clean_ticket_type,
  }


usable_signups = []
seen_emails = set()
usable_rows = [header_line]
skipped_blank_count = 0
duplicate_count = 0

for row in dirty_rows:
  cleaned_signup = clean_signup_row(row)

  if cleaned_signup is None:
      skipped_blank_count += 1
      continue

  cleaned_email = cleaned_signup["email"]

  if cleaned_email in seen_emails:
      duplicate_count += 1
  else:
      seen_emails.add(cleaned_email)
      usable_signups.append(cleaned_signup)
      usable_row = (
          cleaned_signup["full_name"]
          + ","
          + cleaned_signup["email"]
          + ","
          + cleaned_signup["ticket_type"]
      )
      usable_rows.append(usable_row)

signup_summary = {
  "raw_row_count": len(dirty_rows),
  "kept_signup_count": len(usable_signups),
  "skipped_blank_count": skipped_blank_count,
  "duplicate_count": duplicate_count,
}

print("Usable signups:", usable_signups)
print("Usable rows:", usable_rows)
print("Signup summary:", signup_summary)
Advanced Tips
Want more? Click to expand

The most important change in this lesson is that your result is no longer just “string practice.” It is a list that people at a real event could keep using.

Once field cleanup, blank checks, deduplication, and final output connect into one flow, you are already doing the kind of small tool work that real collaboration depends on.

Loading...
Terminal
Terminal
Ready to run...