There is actually missing data in the sample data. In the real data file, it includes the date and time of the entry.
Once sorted by date and ID, then I can be sure that if the date changes and the ID changes as well, then there are no more answers to be had and I can dump the data, empty the hash and move on.
The real file is more like this once sorted:
2018-01-25 01:01:01;Query;1;host;www.example.com
2018-01-25 01:01:01;Answer;1;ip;1.2.3.4
2018-01-25 01:01:05;Query;2;host;www.cnn.com
2018-01-25 01:01:05;Answer;2;ip;2.3.4.5
2018-01-25 01:01:05;Answer;2;ip;2.3.4.5
2018-01-25 01:01:06;Query;3;host;www.google.com
2018-01-25 01:01:06;Answer;3;ip;3.4.5.6
2018-01-25 01:01:08;Query;4;host;www.google.com
2018-01-25 01:01:08;Answer;4;ip;3.4.5.6
2018-01-25 01:01:08;Answer;4;ip;1.2.4.5
2018-01-25 01:01:11;Query;2;host;www.example2.com
2018-01-25 01:01:11;Answer;2;ip;2.3.4.5