I previously messed up with the sources of "Tetris Attack" to gain experience with DS development. Among the things Sten used, there was multi-palette display, using an extra bank of VRAM instead of the "regular" PALETTE_BG and PALETTE_SPRITE.
DISPLAY_BG0_ACTIVE | /* the background */
// ... more settings
The video mode initialisation we have here is fairly standard (note the old-time code with VRAM addresses still exposed to the homebrew coder :P), except for DISPLAY_BG_EXT_PALETTE flag. You can turn extended palettes on/off separately for sprites/backgrounds, or for top/bottom screens, but if you enable it for background, it's used for *all 256 colors backgrounds*. Period.
// 64K, can hold palettes for all four BG layers
// allocate for cpu access
Decompress((void*)(0x6880000 + 256*16*2), sprites_pal_bin);
// an alternate palette with colors dimmed so that we can
// easily dim the whole playfield when the game is over...
CreateShadedPalette((u16*)(0x6880000 + 256*17*2),
(u16*)(0x6880000 + 256*16*2));
0x6880000memory address sits within the 'LCD-mapped' region and is where VRAM bank 'E' appears. Right then, it's "offscreen", somehow, just ready for setup by the CPU. Each palette is 256x2 bytes long, and each background will have access to its 16 palettes at fixed offset. In this game, we had the backdrop on BG0 (thus palette at 0x6880000 to 0x6881E00 = 0x6880000 + 512*15), "blocks" (or apples) on BG1 that has its first palette at offset 512*16 (0x6882000) and its second palette at offset 512*17 (0x6882200). That makes lot of room unused (15 unused palettes for BG0 and 14 unused palettes for BG1), but if you want them to use different palettes (e.g. being converted independently from each other), you don't really have a choice.
Decompress((void*)(0x6880000 + 256*16*2*2), font_pal_bin);
// more background...
Decompress((void*)(0x6880000 + 256*16*3*2), singlebackground_pal_bin);
Score and status are displayed with a tiled layer too, implying yet another palette is needed. He could have used a macro
((void*)(0x6880000 + 256*(16*bg+slot)*2), or (possibly), use bank F/G forcing BG 0 and 2 to share the same 16 slots, but ensuring that those slots in use by BG2 are never needed by BG0 and vice-versa.
// then allocate it for extended palette use
Enough setup: we let the video hardware access those fancy colours ^_^