This is a very interesting endeavour! Here are my two cents:
- If the first row has a string and everything else is numbers, the column has a header. Scalar::Util::looks_like_number could be useful.
- If the first row has a number, it is not likely to be a header.
- If the first row is a string, but repeats further below it is not likely to be a header.
- If the value of the first row is unique but other values appear multiple times it is likely a header. This should be easy to implement.
- I would assign some likelihood for each column. If the average is above a threshold or one or more columns are certain to have a header, the first row is a header row.