🐸

The CSV Processor

Python Basicspython-architect-31-the-csv-processor
Reward: 100 XP
|

The CSV Processor

Welcome to your real-world graduation test.

In previous chapters, your data was pristine. In the Office, data is notoriously messy. A major part of an Architect's job is ensuring the system doesn't collapse when handed bad data.

What is a CSV?

CSV (Comma-Separated Values) is the standard format for exporting spreadsheets (like Excel). It is fundamentally just a text file where each line is a row, and columns are separated by commas ,.

Example: 101,Alice,Engineering,8500

  • Index 0: 101 (ID)
  • Index 1: Alice (Name)
  • Index 2: Engineering (Department)
  • Index 3: 8500 (Salary)

CSV Explanation Diagram

The Challenge: Fault Tolerance

Finance has sent you salaries.csv. You need to read it, extract the salary from the 4th column (Index 3), and sum it all up into total_payroll.

The Catch: The file is corrupted. It contains empty lines, missing salaries, and salaries spelled out as words instead of numbers (which will cause a ValueError if you try to int() them).

Your Task

1
Read the File

Open salaries.csv. Remember to next(file) to skip the header row!

2
Parse Rows

Loop over the file. Strip the whitespace and .split(',') by commas.

3
Defend the System

Use a try...except ValueError: block. Try to convert the salary to an int and add it to total_payroll. If it fails (or if the line is empty/broken), gracefully continue to the next row without crashing!

Suggested Solution
Expand
Solution:

By catching the exception, we prevent one bad row from destroying the entire data pipeline.

total_payroll = 0

with open('salaries.csv', 'r') as file:
  next(file) # Skip header
  
  for line in file:
      clean_line = line.strip()
      
      # Skip completely empty lines
      if not clean_line:
          continue
          
      parts = clean_line.split(',')
      
      # Defense! Try to parse, fail gracefully.
      try:
          salary_str = parts[3]
          salary = int(salary_str)
          total_payroll += salary
      except (ValueError, IndexError):
          # Catches both bad numbers and malformed rows missing columns
          print(f"Skipping bad data row: {clean_line}")

print("Total Valid Payroll:", total_payroll)
Loading...
Terminal
Terminal
Ready to run...