I'm documenting some more thoughts on this:
Once each column is analyzed, I can get an overall probability that a particular row is a header row by multiplying the probabilities that each column in isolation is a header. So: probabilty_col1_is_header * probability_col2_is_header * probability_col3_is_header, etc. When, or if, I get to a row that has a significantly lower overall probability than the previous rows, I can be pretty sure that that row starts the data and that the previous row or rows were headers.