Saturday, July 22, 2023

DS Ram Leakage in TestNScripts

I had thought that fixing 'slope unit-tests' in addition to 'wall unit tests' would mean my fixes to get the scorpeye behaving as intended. That would mean the 'summer code review' would be over and the 'summer game dev' could start. Well, that was hoping too much, because there's actually one more test that, for some reason, doesn't show up in ./testme --list but still exists and is important: TestNScripts. This one loads the real .cmd files of SchoolRush and checks parsing went fine. And it does not only check one of them: it will reload many scripts in the same "engine", the way the game does.


while (ntests < n) {
   TestBench tc;
   for (uint i=0; i < REDO; i++) {
       printf("== R%i/%i redo%i ", ntests, n, i);
       tc.ResetEngineResources();
       ParseScript(tc, scripts[rand()%nscripts], __FUNCTION__);
       tc.CheckEngine();
   }
   ntests++;
   tc.Over();
}

No assertion failure here, instead one of the trickiest things to track: memory exhaustion. So it's time for me to learn how to use my tracking tools again. I even left documentation for my future self back in the past (thanks, past self ^^). Once again, the problem arise also in 'default'... looks like I had been over-optimistic in September 2021 when I merged sprites overlay into default.

Well, at least if gives me hope this time: on the 'newmap branch', the forgotten test terminates with 'Out of DS memory'. I managed to get a log of what is allocated at that time, but it would require filtering what has leaked from last ParseScript, and what belongs to the current, interrupted because there's no more memory, running ParseScript. Hopefully, when the bug occurs on default, it gives me a nice 'halt because of leakage' and not a 'out of DS memory' condition. 

  • [done] understand why the new branch gets so high on memory consumption
  • [todo] make sure .spr and .cmd files in fakeroot/ are working fine (despite they now need bilou.spr and school.spr)
  • [todo] that should be the job of UnitTest.mk

edit: dummy! dummy! dummy! ... the 'DS Ram Usage' statistics in those tests is a lie! when it says 95% used, it doesn't mean you'd use 95% of the NDS 4MiB of RAM to play the script. It means that the dummy-allocator-that-never-recycle-memory used 95% of its storage to process the script. Should parsing become more complex (e.g. because we're loading one more file, or creating more intermediate structures), the memory used by the script will grow while the true RAM usage on the NDS would stay roughly the same.

Eeet ouaip. Je me suis donc morfondu un bon morceau de l'été sur le fait que mon moteur de jeu faisait péter sa consommation de mémoire. En tout cas, ça a sérieusement refroidi mon appétit pour le dévelopement de jeu après les heures de boulot. Trouver le problème, ça demandait de se plonger dans une montagne de données et les outils d'il-y-a-longtemps pour comprendre où était la fuite... Sauf qu'en vrai il n'y avait pas de fuite. Tout ça, c'était juste un ou deux messages d'erreur mal nommés qui me faisaient croire que la lecture des scripts prenait plus de 4MB de mémoire (tout ce que la DS a, en somme), sauf que l'environnement de test pour faire tourner les niveaux de mes jeux DS sur PC n'est pas prévu pour qu'on joue, mais pour qu'on trouve les erreurs. 

Donc contrairement à un système de gestion de mémoire ordinaire, quand on lui "rend" un bloc de mémoire il ne le réutilise jamais. La mémoire qu'il fournit à la demande est toujours de la mémoire qui n'a jamais servi pour une autre tâche dans le niveau, et c'est elle qui finissait par faire défaut. Vous me permettrez de croire que si vous avez envie d'en savoir plus, vous êtes aussi en mesure de lire l'Anglais?

If I want to have an estimate of the memory it would take on the NDS, I'd have to keep track of individual allocs and frees and see whether they reach some maximum at a new alloc. (2023-10-23)

What is that one-time allocator ? well, maybe you've heard of bottom-up allocator, where you just keep track of one position in memory: the top-of-used-area, and everything above the top is unused memory. Of course, as soon as you free things out-of-order compared to how you allocated them, that allocator is no longer enough. You would like at least a list of freed-blocks-that-could-further-reduce-top in case the block at the top is eventually freed. Well, the one-time allocator does not even do that. Any byte of one-time memory can be allocated exactly only once. If freed, it is "painted" as free and will remain like that until the ongoing test terminates. 

That is silly as far as memory management is concerned, but it means if some block has not benn freed when the test terminates, you know what role this memory had when it failed. There can't be any tricky-as-hell case where "yeah, that memory had been used for A and then freed, but then it has been re-allocated for B. So maybe B is wrong or maybe something still had a reference of when it was used for A". Sometimes it truly helps. When I'm not misguided by my past self with lying error messages.

1 comment:

PypeBros said...

By the way, the simpler `TestScripts` fails too. investigation ongoing.