Your benchmark gives as input a list in sorted order. As you mentioned, mergesort is extremely good on sorted or nearly-sorted lists. You should shuffle the list first to avoid giving the builtin a slam-dunk.
I tried it, and in a couple of cases limbic gets comparable times and even beats baseline:
Looking for top 5 in 10000 (running for 9.5 CPU secs)
Rate aristotle browseruk limbic baseline
aristotle 13.7/s -- -46% -89% -92%
browseruk 25.2/s 84% -- -81% -85%
limbic 130/s 847% 414% -- -23%
baseline 169/s 1134% 571% 30% --
Looking for top 5 in 100000 (running for 15 CPU secs)
Rate aristotle browseruk baseline limbic
aristotle 1.38/s -- -44% -85% -90%
browseruk 2.48/s 79% -- -73% -82%
baseline 9.14/s 560% 268% -- -35%
limbic 14.0/s 913% 465% 53% --