It is not just that internals are accessed through macros and global variables, it is that the current data structures are open.
For instance, the way perl scalars are represented internally (SV) is open. You can access the structure from XS, change it, get pointers to its slots, whatever. There is no reasonable way you could provide an API emulating that on top of a substantially different guts.
Also, any proposed solution needs to be simple, reliable and portable.
Regarding, efficiency, in many cases, efficiency is just being able to access the raw data without it passing through intermediate layers. Take for instance my module Sort::Key::Radix. Besides the algorithm, one of the reasons why it is so fast is because it manipulates directly the perl data structures. There isn't a conversion from Perl to some agnostic format when any of its subs is called or when returning from it.