I know a bit too much about all of this, but to avoid doxxing myself I'm not going to reveal why this post gave me flashbacks. Anyway, what I'm most confused about with this post is why the author...
I know a bit too much about all of this, but to avoid doxxing myself I'm not going to reveal why this post gave me flashbacks.
Anyway, what I'm most confused about with this post is why the author tried to OCR individual letters rather than just rendering the page and OCRing that. Maybe being bitten by hyphenation? Although that isn't really that hard to solve. Guessing it has more to do with the author trying to use Python to mess with the data directly rather than feeding it though the JS that does the layout.
And the wacky SVG paths aren't just for obfuscation. They're also for identification. Since they're unique per call its theoretically possible to trace content pirated though this method back to the original request, at least as long as the data wasn't cleaned. I don't recall offhand if that has been used in practice or if it was simply an additional justification.
A fun fact about Amazon's web readers is that the one he found is only a portion of what's there. The data discussed is for regular novel-style books where the text can move around. There's another built into that library for fixed layout books, such as textbooks and picture books. And then there's yet another where the server renders the pages onto images and then just transmits the images. And that's just the browser compatible readers.
Super fascinating! Thanks for sharing. On the OCR: maybe they wanted to map characters specifically to preserve formatting, whereas OCR might lose position in the DOM? Still it’s a good question.
Super fascinating! Thanks for sharing.
On the OCR: maybe they wanted to map characters specifically to preserve formatting, whereas OCR might lose position in the DOM? Still it’s a good question.
I always knew these sites used tricks to make it hard to download stuff (even if you're willing to do it manually using the network tab of your browser), but had no idea it was to this extent! I...
I always knew these sites used tricks to make it hard to download stuff (even if you're willing to do it manually using the network tab of your browser), but had no idea it was to this extent!
I would also love to go fully legit on ebook and audiobook purchases, but Amazon locking everyone in to Kindle exclusivity makes it hard. For example I pay for DRM-free audiobooks on Libro.fm and Downpour, and e-books on Kobo when they have DRM-free versions, and always purchase directly from the author when that's an option, but so often I've started a series and discovered that everything after book 2 or 3 is Kindle exclusive. Usually I just drop it and move on to something new, but there's always a part of me that feels like if they're not willing to offer me a legitimate way to pay for the book without draconian DRM restrictions, then obtaining it via "other" methods are a little easier to justify.
Amazon goes very hard on drm on ebooks generally, including their ebook readers. If this method of getting the books from web works well enough I expect it to get patched out. Personally I view it...
Amazon goes very hard on drm on ebooks generally, including their ebook readers. If this method of getting the books from web works well enough I expect it to get patched out.
Personally I view it this way. If I prefered their ereaders(I don't) or didn't consider mobile or desktop a painfully inferior reading experience(I do) they actually provide a solid service. Except that by using it I would lock myself and pay into a system that will get worse on the timescale of years at best. For bonus points they also unilateraly control the so called purchases.
Further pain point is that specifically Kindle that enforces exlusivity is the only option used by most self published authors.
I've brought this up before, but I'll probably keep talking it up at the risk of sounding like a shill because I'm so happy with it: BookFusion It's a BYOB cloud reading service. It's wonderful. I...
I've brought this up before, but I'll probably keep talking it up at the risk of sounding like a shill because I'm so happy with it: BookFusion
It's a BYOB cloud reading service. It's wonderful.
I buy most of my books through Kobo, but also grab occasional DRM-free reads from Humble Bundle, StoryBundle, Project Gutenberg, Standard Ebooks, etc.
For Kobo books, I strip the DRM in Calibre. For all of my books, regardless of the source, I use Calibre to set up their metadata, and then I upload them into BookFusion using its Calibre plugin.
I have my entire library, in the cloud, ready to read from any device. Their web reader is excellent. Their iOS app is excellent. Their Android app is excellent. I'm someone who reads on a different device depending on my setting, so I genuinely do hop between all three of them on a very regular basis. The reading experience and syncing are honestly smoother than I ever had with the Kindle or Kobo platforms.
They also support pretty much every book type out there, including first-class treatment of comics and PDFs. (No audiobooks yet though).
So, when I say my entire library, I mean that my regular text-only books live under the same roof as all of my graphic novels, and neither feels like a second-class citizen. Also, I can't tell you how much I love that my DRM-free books get the same treatment as everything else. They're completely integrated into my library, instead of feeling tacked on or only available on one device without cloud syncing (like they did on Kindle and Kobo).
There are only two real caveats I can give the service.
The first hardly counts as one, but you do have to pay for it. I pay ~100 USD annually for the plan with the most storage, and, according to my stats, I read over 150 hours on the platform in my first year. Less than a dollar per hour is fantastic value, in my opinion.
The second caveat is an actual one: you can't just plonk it on any ereader device. Currently, the only way to get it on eink devices is to get one that's running Android and install the Android app. Once in the app, you can turn on eink optimizations to make it run well on that type of screen (and, once you do this, it works wonderfully).
Nevertheless, that means that, at present, it won't run on a Kindle or a Kobo. I use a BOOX device as my designated BookFusion reader.
They are currently planning for KOReader integration. I don't think it's a guaranteed feature by any means, but if they do move forward with that, it's possible you could read your BookFusion library through KOReader on a Kobo or Kindle. If you're interested in that, I wouldn't put all your eggs in the BookFusion basket just yet though. Definitely wait it out and see.
But if you're looking to break out of the Kindle ecosystem right now and into one you're in charge of, I can't recommend BookFusion enough.
(No, this post isn't sponsored. I just really, genuinely love BookFusion.)
I don't think bookfusion would do me much good. I use koreader across all my devices whether it's phone, kindle or kobo. Something about modding/jailbreaking devices and unlocking hidden potential...
I don't think bookfusion would do me much good. I use koreader across all my devices whether it's phone, kindle or kobo. Something about modding/jailbreaking devices and unlocking hidden potential scratches an itch for me.
I recently setup a docker container of Calibre-Web to do something similar. I can access the library directly in koreader using the OPDS catalog feature.
How do you like your boox? I've eyed the palma from time to time.
The link is talking about Kindle ebooks specifically. Non-kindle ebooks are much easier to DRM-strip, as kfwyre alludes to with their Kobo-purchased ebooks.
The link is talking about Kindle ebooks specifically. Non-kindle ebooks are much easier to DRM-strip, as kfwyre alludes to with their Kobo-purchased ebooks.
I know a bit too much about all of this, but to avoid doxxing myself I'm not going to reveal why this post gave me flashbacks.
Anyway, what I'm most confused about with this post is why the author tried to OCR individual letters rather than just rendering the page and OCRing that. Maybe being bitten by hyphenation? Although that isn't really that hard to solve. Guessing it has more to do with the author trying to use Python to mess with the data directly rather than feeding it though the JS that does the layout.
And the wacky SVG paths aren't just for obfuscation. They're also for identification. Since they're unique per call its theoretically possible to trace content pirated though this method back to the original request, at least as long as the data wasn't cleaned. I don't recall offhand if that has been used in practice or if it was simply an additional justification.
A fun fact about Amazon's web readers is that the one he found is only a portion of what's there. The data discussed is for regular novel-style books where the text can move around. There's another built into that library for fixed layout books, such as textbooks and picture books. And then there's yet another where the server renders the pages onto images and then just transmits the images. And that's just the browser compatible readers.
Super fascinating! Thanks for sharing.
On the OCR: maybe they wanted to map characters specifically to preserve formatting, whereas OCR might lose position in the DOM? Still it’s a good question.
I always knew these sites used tricks to make it hard to download stuff (even if you're willing to do it manually using the network tab of your browser), but had no idea it was to this extent!
I would also love to go fully legit on ebook and audiobook purchases, but Amazon locking everyone in to Kindle exclusivity makes it hard. For example I pay for DRM-free audiobooks on Libro.fm and Downpour, and e-books on Kobo when they have DRM-free versions, and always purchase directly from the author when that's an option, but so often I've started a series and discovered that everything after book 2 or 3 is Kindle exclusive. Usually I just drop it and move on to something new, but there's always a part of me that feels like if they're not willing to offer me a legitimate way to pay for the book without draconian DRM restrictions, then obtaining it via "other" methods are a little easier to justify.
World needs more people like this. If they are screwing us up, we should fight back.
All hope for a future with a free and open internet lies in the Hackers.
Amazon goes very hard on drm on ebooks generally, including their ebook readers. If this method of getting the books from web works well enough I expect it to get patched out.
Personally I view it this way. If I prefered their ereaders(I don't) or didn't consider mobile or desktop a painfully inferior reading experience(I do) they actually provide a solid service. Except that by using it I would lock myself and pay into a system that will get worse on the timescale of years at best. For bonus points they also unilateraly control the so called purchases.
Further pain point is that specifically Kindle that enforces exlusivity is the only option used by most self published authors.
I've brought this up before, but I'll probably keep talking it up at the risk of sounding like a shill because I'm so happy with it: BookFusion
It's a BYOB cloud reading service. It's wonderful.
I buy most of my books through Kobo, but also grab occasional DRM-free reads from Humble Bundle, StoryBundle, Project Gutenberg, Standard Ebooks, etc.
For Kobo books, I strip the DRM in Calibre. For all of my books, regardless of the source, I use Calibre to set up their metadata, and then I upload them into BookFusion using its Calibre plugin.
I have my entire library, in the cloud, ready to read from any device. Their web reader is excellent. Their iOS app is excellent. Their Android app is excellent. I'm someone who reads on a different device depending on my setting, so I genuinely do hop between all three of them on a very regular basis. The reading experience and syncing are honestly smoother than I ever had with the Kindle or Kobo platforms.
They also support pretty much every book type out there, including first-class treatment of comics and PDFs. (No audiobooks yet though).
So, when I say my entire library, I mean that my regular text-only books live under the same roof as all of my graphic novels, and neither feels like a second-class citizen. Also, I can't tell you how much I love that my DRM-free books get the same treatment as everything else. They're completely integrated into my library, instead of feeling tacked on or only available on one device without cloud syncing (like they did on Kindle and Kobo).
There are only two real caveats I can give the service.
The first hardly counts as one, but you do have to pay for it. I pay ~100 USD annually for the plan with the most storage, and, according to my stats, I read over 150 hours on the platform in my first year. Less than a dollar per hour is fantastic value, in my opinion.
The second caveat is an actual one: you can't just plonk it on any ereader device. Currently, the only way to get it on eink devices is to get one that's running Android and install the Android app. Once in the app, you can turn on eink optimizations to make it run well on that type of screen (and, once you do this, it works wonderfully).
Nevertheless, that means that, at present, it won't run on a Kindle or a Kobo. I use a BOOX device as my designated BookFusion reader.
They are currently planning for KOReader integration. I don't think it's a guaranteed feature by any means, but if they do move forward with that, it's possible you could read your BookFusion library through KOReader on a Kobo or Kindle. If you're interested in that, I wouldn't put all your eggs in the BookFusion basket just yet though. Definitely wait it out and see.
But if you're looking to break out of the Kindle ecosystem right now and into one you're in charge of, I can't recommend BookFusion enough.
(No, this post isn't sponsored. I just really, genuinely love BookFusion.)
I don't think bookfusion would do me much good. I use koreader across all my devices whether it's phone, kindle or kobo. Something about modding/jailbreaking devices and unlocking hidden potential scratches an itch for me.
I recently setup a docker container of Calibre-Web to do something similar. I can access the library directly in koreader using the OPDS catalog feature.
How do you like your boox? I've eyed the palma from time to time.
interesting that ebooks are so difficult while audiobooks are a breeze to download and strip the drm from.
The link is talking about Kindle ebooks specifically. Non-kindle ebooks are much easier to DRM-strip, as kfwyre alludes to with their Kobo-purchased ebooks.
yeah, even kindle isn't difficult w/ calibre, kfx, dedrm tool -- no need to get fancy yet.