22 votes

Is there a good S3-compatible datastore for a hobbyist?

Tags: s3, ask.advice

I've read nice things about Amazon's S3. There are some compatible implementations from other major vendors like Google and Cloudflare. There are projects that automatically back up and replicate a sqlite database using S3. Some people have backed up Google Photos to S3.

But I've never used any of them. What would be a good way to get started? Amazon or another vendor? (And does this make sense at all?)

18 comments

  1. [2]
    whs
    Link
    My team has been using MinIO as S3 mock. Quite simple to install and use even in CI.

    My team has been using MinIO as S3 mock. Quite simple to install and use even in CI.

    18 votes
    1. Bwerf
      Link Parent
      I use minio for my home network. Easy to get up and running, but I haven't done anything advanced.

      I use minio for my home network. Easy to get up and running, but I haven't done anything advanced.

  2. [5]
    Greg
    Link
    Tl;dr: Backblaze or Cloudflare are good options assuming you want a hosted service. Whatever you end up using, make sure to check what it costs to get your data back out again - that’s where they...

    Tl;dr: Backblaze or Cloudflare are good options assuming you want a hosted service.


    Whatever you end up using, make sure to check what it costs to get your data back out again - that’s where they often get you. S3 itself is $0.01-0.02/GB/month for actual storage, for example, but then egress for that GB is $0.09 every time anyone needs to access it. Large amounts of data, or moderate amounts to large numbers of people, add up fast.

    There are a ton of S3 compatible stores out there, so I won’t claim to know the details on every single one, but I’m with Backblaze for personal use (NAS offsite backups) and they’ve been consistently great - cheap, tech focused, and the reliability of being a massive player who only really does storage. Egress is pretty good at $0.01, but since they’re backup focused (i.e. mostly writes, very occasional big reads) there’s also the “fedex a hard drive” option to avoid that entirely, which I appreciate.

    At work we’ve historically used S3 as there’s a lot of data we create and process with AWS without it needing to leave, but the lock-in implications are getting worrying - nobody wants a five figure exit bill just to shift to another cloud. We’re trialling Cloudflare R2 with moderate success so far: the major advantage is totally free egress, and knowing what their bandwidth capacity is like I do actually trust them to stick with that, but it only implements the basic S3 API (no versioning, no lifecycle policies, worse Terraform support, newer and less proven reliability). For hobbyist use that stuff likely matters a lot less, so it may well be a good shout.

    15 votes
    1. [4]
      skybrian
      (edited )
      Link Parent
      Thanks, I knew Backblaze was a backup service but didn’t realize that they also provide S3-compatible storage. Since it’s much cheaper than Amazon, I was wondering what the catch is. Apparently...

      Thanks, I knew Backblaze was a backup service but didn’t realize that they also provide S3-compatible storage.

      Since it’s much cheaper than Amazon, I was wondering what the catch is. Apparently your data is stored in one region. There are only a few regions. Replication is possible but quite limited. That does seem very reasonable for most purposes, but maybe not a great fit for edge servers like Deno Deploy?

      (Caring about multi-region replication is admittedly very weird for a hobbyist. Also, litestream is single-writer so maybe one region is fine?)

      I see that Litestream has replica guides for a list of S3 compatible services and that seems like a good place to look popular services to use with sqlite.

      I don’t really know what I want, but I’m thinking of using this for building websites, so maybe integrating with a CDN is good.

      1 vote
      1. [3]
        Greg
        Link Parent
        Definitely smart to be wary of underpricing, although I will say that S3 isn’t necessarily the best benchmark: their storage charges are nothing to write home about, and then the egress is just a...

        Definitely smart to be wary of underpricing, although I will say that S3 isn’t necessarily the best benchmark: their storage charges are nothing to write home about, and then the egress is just a naked cash grab - it’s literally hundreds if not thousands of times more than cost, depending on which colo provider you ask.

        In terms of replication, I’m the one with offsite backups for a personal NAS here so I’m certainly not going to judge you for thinking about it on a hobby project! What I will say is that it sounds like you’re probably more in the market for a CDN to point at your storage bucket rather than replicating that bucket across regions.

        Assuming you’re thinking about it for performance, Cloudflare are arguably the kings/queens/gender neutral heads of state in the space. They’re relative newbies to storage, but networking is their bread and butter and they’ve got strong edge compute options to go with it. Whichever provider you go for, a CDN is tuned for performance (vs object stores that are tuned for durability) and the optimisations they can do on the fly with real-time knowledge of the traffic, the state of the network, and the state of the edge storage will generally outperform any simple preset replication strategy you can think of. It also means you pay for durability on a single copy and then just for what you use at the edge, rather than full duplication multiple times.

        If you did want it for disaster recovery I’d use a zero-egress provider as your primary store and then sync it to a different cloud provider entirely - it’s literally cheaper to sync another provider to S3 than it is to sync between AWS regions, and you get much better redundancy too. If you could get away with a short downtime in the 0.00…% chance of a total loss of your primary provider, you can use the archive tier on your secondary provider for fractions of a cent per GB.

        1 vote
        1. [2]
          skybrian
          Link Parent
          For Deno Deploy, the kind of performance I'm thinking about is cold-start latency. Someone visits the website, the server starts up and loads some kind of smallish database into memory, then it...

          For Deno Deploy, the kind of performance I'm thinking about is cold-start latency. Someone visits the website, the server starts up and loads some kind of smallish database into memory, then it decides what to do and maybe serves up a larger amount of data, possibly modifying as it sends it. On updates, the smallish database needs to get replicated. Deno Deploy has no local state currently, so it needs to happen somewhere else.

          I have a decent setup using Neon for a Postgres database that starts up on demand, so maybe I should stick with that. And Deno has a KV store in closed beta.

          The part where I'm over-thinking it is that Deno Deploy has edge nodes in different regions around the world and I wonder what latency would be like for them. I could make it fast for me by putting the database in the western US. :-)

          Cloudflare certainly has impressive offerings but the free tier doesn't include the more interesting stuff and I haven't tried them yet.

          The other thing I should really do someday is back up Google Photos somewhere. There is an rclone tool but it has lots of limitations.

          1 vote
          1. Greg
            Link Parent
            That makes total sense! If I’m understanding correctly, you could use a CDN URL for the lightstream restore call that runs when a Deno Deploy instance boots? That way their servers would get the...

            That makes total sense! If I’m understanding correctly, you could use a CDN URL for the lightstream restore call that runs when a Deno Deploy instance boots? That way their servers would get the same speed and locality advantages that end users would, without having to guess in advance which of their regions you’ll end up using most.

            Of course, if you wanted your storage to be on the same physical network as the servers it’d be best to have them both from a single provider, and Deno Deploy’s region list definitely looks familiar. Litestream supports GCS natively too, so might well fit together quite nicely for what you’re doing - although I will say that Neon looks very cool too, and it’s one I hadn’t come across so cheers for another bit of interesting tech to read up on on the train!

            1 vote
  3. bugsmith
    Link
    Scaleway offers 75 GB of storage for free and includes 75 GB of egress per month.

    Scaleway offers 75 GB of storage for free and includes 75 GB of egress per month.

    5 votes
  4. Oxalis
    Link
    If your data payload is under 25GB with 25GB monthly egress, you could try the free tier of https://storj.io They have their own durable storage backend and provide a free S3 compatible API so you...

    If your data payload is under 25GB with 25GB monthly egress, you could try the free tier of https://storj.io
    They have their own durable storage backend and provide a free S3 compatible API so you can use whatever S3 tooling you want.

    I've been using their free tier for testing restic backup and it's been solid for over a year.

    If on-site is your thing, then MinIO, as said by others, is the de facto choice. I believe you can use it with one physical disk but you wont get any of the storage benefits that S3 and similar provide.

    4 votes
  5. metadaemon
    Link
    I chose backblaze storage for a side project. Seems nice and it has a free tier.

    I chose backblaze storage for a side project. Seems nice and it has a free tier.

    3 votes
  6. [3]
    mxuribe
    Link
    @skybrian I have also heard of Wasabi storage: https://wasabi.com/cloud-storage-pricing/#cost-estimates NOTE: I have never used them myself; so have no experience with them. I have only heard...

    @skybrian I have also heard of Wasabi storage: https://wasabi.com/cloud-storage-pricing/#cost-estimates

    NOTE: I have never used them myself; so have no experience with them. I have only heard about them some time ago from technology podcasters...and the above link shows a comparison pricing vs other providers. At a quick glance, it appears that they do not charge for egree, which sounds great. But, again, i have no idea how reliable or performant they are (or not).

    3 votes
    1. [2]
      csos95
      Link Parent
      I looked into using Wasabi a few years ago for storing images on my mastodon instance and I did a lot of searching through their documentation to figure out the specifics because "no egress fees"...

      I looked into using Wasabi a few years ago for storing images on my mastodon instance and I did a lot of searching through their documentation to figure out the specifics because "no egress fees" seemed too good to be true for one monthly price and it turned out it was.

      Egress is limited to however much data you have stored for the month.
      So any use-case where files are read more than they are written, would not be allowed and you'd need to use the alternative pricing model.

      There's also a 90 day minimum for any data stored.

      5 votes
      1. mxuribe
        Link Parent
        Ah-ha, so that's how they get you! Thanks for sharing!

        Egress is limited to however much data you have stored for the month......There's also a 90 day minimum for any data stored....

        Ah-ha, so that's how they get you! Thanks for sharing!

        1 vote
  7. chromakode
    Link
    +1 to Minio, using it for a side project. Easy to spin up a docker container. I'd also mention that both AWS and GCS provide a free tier. For hobbyist use this could be more than you need. For S3...

    +1 to Minio, using it for a side project. Easy to spin up a docker container.

    I'd also mention that both AWS and GCS provide a free tier. For hobbyist use this could be more than you need. For S3 it's 5GB free for first year, for GCS it's 5GB free indefinitely.

    2 votes
  8. [2]
    bembo
    Link
    If you're interested in self hosting Garage looks quite good.

    If you're interested in self hosting Garage looks quite good.

    1 vote
    1. Akir
      Link Parent
      There's lots of option for self-hosting. I think Ceph gets a lot of use by businesses, and IIRC you can even set up Apache to work that way if you're a sadist who likes to deep-dive in to manual...

      There's lots of option for self-hosting. I think Ceph gets a lot of use by businesses, and IIRC you can even set up Apache to work that way if you're a sadist who likes to deep-dive in to manual configurations.

  9. m1k3
    Link
    I think storj is what you're looking for.

    I think storj is what you're looking for.

  10. supported
    Link
    I use CloudFlare, I do like it.

    I use CloudFlare, I do like it.