Capture D1 cols 100-101 (school class in HS exports)#21
Conversation
…rts) In high-school Hy-Tek exports cols 100-101 hold the swimmer's school class (Fr/So/Jr/Sr, case varies by MM version); in club exports they hold other data or are blank. Surfaced under a neutral name like unparsed_d1_col_125 so the consumer interprets it. Additive: one Swimmer field + one extract() in d1_parser. Tests: d1_parser unit test for the captured token (test_d_swimmer_parsers), an integration assertion on a real HS fixture (TestMM5RelayAlternates), the blank-case assertion, and the new slot initialized in the schema test helper.
93ad064 to
d905e51
Compare
|
Just caught this missing field after analyzing some high school meets ingested that were classified incorrectly. This field + some logic in the consumer helps to correlate it to a high school meet. |
egelja
left a comment
There was a problem hiding this comment.
Nice work! Just one small comment.
Also, I stumbled upon this, might make it easier to troubleshoot bugs in future:
| swimmer.unparsed_d1_col_125 = extract(line, 125, 1) or None | ||
| # unparsed_d1_col_100: cols 100-101 (2 chars). "Fr"/"So"/"Jr"/"Sr" school | ||
| # class in HS-meet exports; other data / blank in club exports. | ||
| swimmer.unparsed_d1_col_100 = extract(line, 100, 2) or None |
There was a problem hiding this comment.
I just verified. This field is called "Class Year" in the Meet Manager GUI.
| swimmer.unparsed_d1_col_100 = extract(line, 100, 2) or None | |
| swimmer.class_year = extract(line, 100, 2) or None |
There was a problem hiding this comment.
Yes I thought in naming it that, but it is not a 100% class_year, for non HS meets it shows other data. We can keep it class_year if you prefer.
There was a problem hiding this comment.
Here's the breakdown — D1 cols 100–101 across my hy3 files (~9.5M records):
| Bucket | Count | What it is |
|---|---|---|
| (blank) | 8,951,793 | The vast majority — club/age-group meets don't populate it |
class Fr/So/Jr/Sr |
321,769 | The HS school class (what we capture) |
numeric 9–12 |
144,954 | Ambiguous — grade in numeric-HS meets, but mostly 9–12-year-old ages in age-group meets |
| numeric other | 63,636 | Swimmer age — 5,6,7,8, teens 13–18, etc. |
numeric 20–30 |
13,685 | Graduation year (26, 25, 21 = class-of-YYYY) |
| other alpha/mixed | 28,165 | misc codes (MS, WC, Mi, F2, bare A/B) |
| @@ -29,6 +29,9 @@ def d1_parser( | |||
| # unparsed_d1_col_125: col 125 (1 char). Observed: 'N' or blank; semantics unverified. | |||
| swimmer.citizenship = extract(line, 113, 3) or None | |||
| swimmer.unparsed_d1_col_125 = extract(line, 125, 1) or None | |||
There was a problem hiding this comment.
I did some digging on this, every file I have has it set to N.
There was a problem hiding this comment.
Yes, I couldn't find figure out what is the N in some meets.
Capture D1 columns 100–101 (
unparsed_d1_col_100)Adds capture of the 2-character field at D1 columns 100–101, currently dropped by
d1_parser.In high-school Hy-Tek meet exports this field holds the swimmer's school class —
Fr/So/Jr/Sr(case varies across Hy-Tek versions). In club/age-group exports the same columns hold other data or are blank, so — likeunparsed_d1_col_125— it's surfaced under a deliberately neutral name and left for the consumer to interpret rather than asserting a fixed meaning.This mirrors the existing
citizenship/unparsed_d1_col_125additions: one field on theSwimmerschema plus oneextract(line, 100, 2) or Noneline ind1_parser. Verified against real high-school result files (the tokens land exactly at cols 100–101 across MM2–MM5 exports).No behavior change for existing fields; purely additive.