Everything else will seem silly to you if you do not accept that, by then, getting more KB of memory for your game was very - very - hard. The size of your game was decided by non-technical people based on how much the console vendor would charge for a 2Mbit chip, when the game should came out and how much kids would be allowed to spend given which license you'd be using. So they have early planning deciding how much to dedicate to sprites, levels, code, maps, etc. Based on that, they'll decide how much levels there will be in the game, etc.
Of course, game characters animation all started by having characters whose size fit the hardware requirements (mario nicely stands within a 16x16 box and a 16x16 mushroom makes him 16x32), flipping from one sprite to another within an all-in-VRAM bank. Then some special characters (the hero) would get a special status and only get one or two VRAM slots dynamically updated. To crunch more animation frames, one could use run-length-encoding compression that does wonders on row of pixels of identical color. Others have used 2/3-bit-to-4-bit decompression once realizing that Link sprite (and all others) only need 8 colors per palette, not 16. But all this requires CPU, and the CPU resources too, were limited (Not even 8MHz. Less than my good old 80386).
If we could instead keep the same binary format between the ROM and the RAM, having the right picture in video memory at the right time is all a matter of "blasting" them through the Direct Memory Access chip. See that big line on my notes ? that's the DMA doing its job, while the CPU can focus on crunching numbers to make the game physics stunning and fun...
To make that possible with fun stretch-and-squash, cartoon-like animation, they ultimately relied on their chopper tool that cuts pictures into hardware-sized sprites. Just like the one I imagined for Titus's Prehistorik II sprites.
Ok, granted, it doesn't look completely automated. But the idea is clearly there. And ultimately, it would run on a system that has 1/4 of the power of my Nintendo DS.
So, am I allowed to dream of porting some libgeds game on 16-bit engines ? Well, with the engine refactoring that splits script parsing, it is pretty tempting to see what we could do about it.
Let's start with the animations, thus. What is weird with the animations is that their code has to interrupt every here and there when there is some delay. In high-level language, we'd likely use a switch construct branching you to frame T or frame T+1 code depending on some argument we'd pass to the function. But if we're generating machine code instead, we can do much better. We can then have the actual next animation instruction remembered, rather than an index into an array of virtual instructions. No more conditionals and branch delays on that non-speculating old CPU. Just one jump.
Implementing "keep that state for N screen refreshes" is then looking a lot like software multi-threading: you have a call to some yield_animation micro-routine (and saving your current position into the generated animation code on the stack), which will pop that resume position into some CPU register (an internal scratch variable, in case you didn't know yet), and then return to the code that called animate_aladdin, letting it save the next animation position where it sees fit. Looping animation ? super-easy ! Have you seen how much boilerplate the current virtual-RISC-processor-for-animations of libgeds and AnimEDS must deal with instead ?
What else ? State machine of course. State machines are built with simple expressions used either to guard transition (only let them used when some condition is met) or to define what to do when the transition occur (besides changing states, that is, like playing a sound, changing speed, etc).
The collision system currently will follow a list of GobExpressions calling eval(guard_predicate) until one returns true, then proceeding with eval(action) and changing state. Instead, with generated machine code, that would all be packed into a sequence of predicate code that branch to the appropriate action code or keep testing until we hit the "okay, then nothing happens" terminator that returns to the collision system itself.
One day ... maybe. That would be much more interesting on 16-bit than it would be on DS or native x86_64 code, anyway.