My fairy suggested a very nice Christmas gift for my (not so lil' anymore) nephews: she owns a thermal books binder and went "oh, but you could print some story of yours and bind it in a book with
my machine... My audio processors catalyzed that into "how about stripping some meaningful text out of your blog and print that on 8x8" pages?" and immediately thanked her for that marvelous idea.
Of course, that required first a major upgrade of
my "blogpressing" tools. The "images scanner" got complemented with
list-post.pl which extracts posts having a certain tag and formats them into an HTML document. From there, I could use Open Office to import the document, export it into ODT format and start adjusting image sizes and other formatting annoyance to fit the documents into two ~50 pages illustrated text. The fight to get that printed out of my fairy's HP all-in-one printer is for another post.
Suffice to say that I decided that I'd avoid
printing for my own reading needs and try to take advantage of my cybook instead.
I had no luck with the graphical front-end of Calibre this time, so I
dug a bit the web and figured out that I could use the command line approach:
ebook-convert tagtionary.html test.epub --breadth-first --max-levels=8 --margin-left=2 --margin-right=2 --verbose
Even then, calibre gave me a hard time. I guess running Lucid Lynx in 2013 is the root of all my problems, so I'll have to upgrade sooner than wished. Btw:
non-ASCII characters in URLs abort the HTML-to-epub conversion -- leading with mysterious "ascii codec can't decode" exception (and no offending URL/file mentioned), and so did
%-escaping in filenames. I had to manually interrupt the conversion after it took about half an hour in conversion attempts, with 3GB resident set and taking up to 7GB of virtual address space.
It got to the "Creating EPUB Output" stage, and most pages said "No large tree found", then "splitting on page-break". All fine. A few pages have "large tree #0", with a split point defined. (I have no idea why "Split point: {http://www.w3.org/1999/xhtml}h3 /*/*[2]/*[663]" is mentionned there). Even the largest file "english.html" got happily split into 6 parts. Then for some curious reason, "mybrew.html" enters an endless series of "splitting... split tree still too large: 464KB."
From there on, it consumed more and more memory, obviously leaking all the prior attempts.
(edit: after dropping the offending mybrew.html, I managed to get the 54MB epub file. Checking on Odyssey ASAP).
(edit++: Calibre distributed in latest LTS handled mybrew.html out of the (virutal)box =:)