Tuesday, January 12, 2010

blog backup scripts

Ok, Blogger provides you a .xml blog export for backup purposes, which happens to export virtually everything (including template and settings) into a RSS format. But what about your blog images, which imho account for a good 50% of this blog's spirit.

I gave blog2print a try, and they weren't able to grab more than a few jpg photos. So I went for my own, custom, regular-expression powered, post-processing perl script. Here comes the output. All the meaningful pictures gathered in a single folder, sorted by location in the XML file so that they can later be included in e.g. the output of a blogger2TeX tool ...

That makes 512 unique pictures (after fdupes duplicata deletion, thanks to Cyril for the tip) for ~350 posts (including 24 unpublished drafts)

2 comments:

Cyril said...

Pas mal, mais tu pourrais éviter d'avoir plusieurs fois le même fichier grâce à fdupes:

blog2tex/pictures> fdupes -1 .
./249-1-511427719_small.jpg ./264-3-511427719_small.jpg
./1053-1-colors-experiments.png ./915-1-colors-experiments.png
./156-5-blocedit_003.png ./675-5-blocedit_003.png
./834-2-anim.gif ./837-9-anim.gif
./36-1-runme-reloaded.png ./420-1-runme-reloaded.png ./423-1-runme-reloaded.png
./729-4-link-ph.png ./519-3-link-ph.png ./927-5-link-ph.png
./369-1-tongue-code.png ./846-1-tongue-code.png ./456-1-tongue-code.png
./954-1-medit-mockup.png ./837-5-medit-mockup.png
./252-1-555387425_small.jpg ./264-4-555387425_small.jpg
./243-1-557347036_small.jpg ./264-5-557347036_small.jpg
./300-2-DS-level-scrolling.png ./75-5-DS-level-scrolling.png
./411-1-bilou-zik.png ./837-7-bilou-zik.png ./843-1-bilou-zik.png
./207-3-ruler-charging.png ./207-11-ruler-charging.png
./909-1-v01-rumbling.png ./837-1-v01-rumbling.png
./75-3-gedsdemo-20090519b.png ./489-3-gedsdemo-20090519b.png
./153-1-keenwalks.gif ./93-2-keenwalks.gif
./264-1-DS_display.png ./246-1-DS_display.png
./132-15-trip-3.png ./516-5-trip-3.png

PypeBros said...

mouais, évidemment, le script "scan-for-pictures.pl" suppose que le fichier .xml produit par blogger a déjà été pré-processé pour séparer les règlages, le template, le contenu et les commentaires. J'avais fait ça à la main l'an dernier. Cette année, il faudra que j'automatise un peu plus.