档案馆终试炼
Hoppy 终于走到了档案馆最深处。高塔里的卷轴、柜门里的记录、散落的噪声文本,都会在这里汇成最后一次真正的试炼。档案官把一叠遗物记录推到桌前:这些行里有脏痕迹、有重复条目,也缺少需要从 vault 名册里补上的馆藏信息。只有把整条处理流程稳稳收住,档案馆的大门才会为你打开。
所以这节课不是再加新的花样,而是把这一整门 Series 学过的核心动作自然连起来:清理文本、拆字段、读取 JSON、选择合适的结构、保留第一次出现的记录、做统计、再给出一个明确的最终结论。做完这节课时,你应该能很踏实地感觉到:我真的已经会用了。
先看一个更小的动作:拆开一行,再从里面找出真正重要的线索
在终试炼里,你会不断做同一种判断:这一段文本到底是什么字段?清理之后,该拿哪一段继续判断?下面先用一个很小的玩具例子练一下这种感觉。
row = "relic=moon key | seal=amber spark"
parts = row.split(" | ")
seal_mark = parts[1].split("=")[1]
if seal_mark.find("amber") != -1:
print("amber found")
这里没有解今天的整题,只是在示范一个关键动作:先把一行拆开,再从某个字段里继续读信息。真正的 starter 还要把 noisy 文本清干净、结合 JSON 资料补馆藏信息、去重、统计,并做出最后的通行判断。
今天的任务:完成整场档案试炼,并交出最终通行结论
starter 已经帮你读好了 archivist_trial.txt 和 vault_index.json。你要把这份脚本补完整,让它完成一条完整的数据流程:
完成 clean_line(raw_line)。这里仍然是你熟悉的清理动作:去掉前缀 "## "、把 "~" 还原成空格、清掉尾巴上的 "??"。这一步做稳,后面的字段才会真正可读。
在 build_record(cleaned_line) 里,把一行拆成 relic_name、vault_code、status 和 seal_mark,再用 vault_index[vault_code] 补出 hall_name 和 keeper_name。
用 seen_relics 这个 set 和记住“这个 relic_name 有没有出现过”,再把第一次出现的记录按顺序放进 unique_records。接着用 hall_counts 统计每个 hall 的唯一记录数,再做出 amber_ready_relics 这样的最终试炼线索。
这节课最重要的收尾不是“打印很多中间结果”,而是把它们收束成两个明确结果:trial_summary 和 access_decision。前者说明这场试炼里到底发生了什么,后者则给出清楚的通行判断。
这里不引入新知识,也不想把你推进一个开放式大项目。你要做的是把前面已经学过的动作自然接起来,让这份档案脚本真正完成一次可信、清楚、可交付的终试炼。
参考答案点击展开点击收起
import json
with open("archivist_trial.txt", "r", encoding="utf-8") as file:
trial_text = file.read().strip()
with open("vault_index.json", "r", encoding="utf-8") as file:
vault_index = json.load(file)
print("Trial text:")
print(trial_text)
print("Vault index:", vault_index)
trial_lines = trial_text.splitlines()
print("Trial lines:", trial_lines)
def clean_line(raw_line):
return raw_line.strip().replace("## ", "").replace("~", " ").replace("??", "")
def build_record(cleaned_line):
parts = cleaned_line.split(" | ")
relic_name = parts[0].split("=")[1]
vault_code = parts[1].split("=")[1]
status = parts[2].split("=")[1]
seal_mark = parts[3].split("=")[1]
vault_record = vault_index[vault_code]
return {
"relic_name": relic_name,
"vault_code": vault_code,
"status": status,
"seal_mark": seal_mark,
"hall_name": vault_record["hall_name"],
"keeper_name": vault_record["keeper_name"],
}
cleaned_lines = []
for raw_line in trial_lines:
cleaned_lines.append(clean_line(raw_line))
all_records = []
for cleaned_line in cleaned_lines:
all_records.append(build_record(cleaned_line))
seen_relics = set()
unique_records = []
for record in all_records:
relic_name = record["relic_name"]
if relic_name not in seen_relics:
seen_relics.add(relic_name)
unique_records.append(record)
hall_counts = {}
for record in unique_records:
hall_name = record["hall_name"]
if hall_name not in hall_counts:
hall_counts[hall_name] = 0
hall_counts[hall_name] += 1
amber_ready_relics = []
for record in unique_records:
if record["status"] == "ready" and record["seal_mark"].find("amber") != -1:
amber_ready_relics.append(record["relic_name"])
keeper_names = []
for record in unique_records:
keeper_name = record["keeper_name"]
if keeper_name not in keeper_names:
keeper_names.append(keeper_name)
trial_summary = {
"raw_row_count": len(all_records),
"unique_relic_count": len(unique_records),
"duplicate_row_count": len(all_records) - len(unique_records),
"ready_unique_count": len([record for record in unique_records if record["status"] == "ready"]),
"amber_ready_relics": amber_ready_relics,
"hall_counts": hall_counts,
}
trial_passed = (
trial_summary["unique_relic_count"] == 5
and trial_summary["ready_unique_count"] >= 4
and len(trial_summary["amber_ready_relics"]) >= 3
and len(trial_summary["hall_counts"]) == len(vault_index)
)
access_decision = {
"verdict": "pass" if trial_passed else "retry",
"keeper_roll_call": ", ".join(keeper_names),
"final_message": "The archive opens." if trial_passed else "The archive asks for another pass.",
}
print("Cleaned lines:", cleaned_lines)
print("All records:", all_records)
print("Seen relics:", seen_relics)
print("Unique records:", unique_records)
print("Hall counts:", hall_counts)
print("Amber ready relics:", amber_ready_relics)
print("Keeper names:", keeper_names)
print("Trial summary:", trial_summary)
print("Access decision:", access_decision)高级技巧想更进一步?点击展开点击收起
如果你能把这节课稳稳做完,你带走的就不只是某几个方法,而是一条真正能做事的数据处理路径:从脏文本开始,清理、拆解、补信息、去重、统计、判断,再交出一个清楚的结果。
Chapter 6 到这里就完整收束了。下一章会把这份能力带离 Hoppy 世界,进入现实任务;但在迈出去之前,你已经在档案馆里完成了最后一次真正的主线试炼。