I have some photos-takeout-20230630T070258Z-001.zip ... presumably
created when Google announced they would do cannot-remember-what to
their albums service.
- it has multiple folders for the same blog, with date tags.
- it has 3030 files total. There are 30 pair of duplicates
- it finds some duplicates between files, but always in the same folder, like 2007-02-04-Bilou_s Book/biokid-4-superjump.png and the same folder with biokid-4-superjump(1).png
- all the files have the same date as the archive. Proper timestamps will have to be looked up into companion .json files
- there's obviously a maximum filename esp. for screenshots (40 characters ?)
- The date in the folder name seems to be album creation time. For this blog, I have 2007, 2011, 2021 and 2023
- There isn't any file with timestamp earlier than 2007. So where are the pictures from 2006 ? like http://photos1.blogger.com/blogger/5640/3676/320/indigo-hints.png or http://photos1.blogger.com/blogger/5640/3676/320/blending-example.png ? Apparently in another castle.
- The oldest entry for which images are available is http://sylvainhb.blogspot.com/2007/03/vous-courriez-jen-suis-fort-aise-eh.html http://sylvainhb.blogspot.com/2007/02/pour-faire-un-aarbreeeuu.html (unfortunately, that's missing the pictures-heavy http://sylvainhb.blogspot.com/2007/02/un-arbre-cachant-la-fort.html)
- whole chunks (like everything 2014 Q4) is missing
- timestamps are now only in milliseconds-since-epoch format
- time span between oldest and latest timestamps is 6284 days... about 17.2 years
- 5640 files for this blog (plus .json descriptors) and 2513 duplicates set... that means about every picture has a duplicate :-/
- it seems to have some of the missing files mentioned earlier, like bladorstack
- still no indigo-hints.png or blending-example.png, unfortunately T_T
I'll have to dig that more. I'll have to merge that with the "blogpress" tools and possibly make myself a "blogger taken out dashboard".