Web3 Adventures: Distributed Storage with Filecoin

My latest foray into Web3 is a result of a Udemy course I decided to do on Web3 which centers on Ethereum. Thing is, Ethereum isn’t a good model for storage, so I paused and went looking at how web3 storage works.

Filecoint (web3.storage) demo component built with b8rjs. The UI blurs the API token as soon as you paste it in, so save me bother for screen caps.

Now, the Web3 library provides “out of box” support for Bzz — or The Swarm — a bee-themed distributed storage system. Much as I love just how well the whole Bee theme works in terms of naming stuff, problem with Bzz is that you need to run your own instance which isn’t an enormous hurdle (basically, you can run one using Docker) but it’s not viable for quick-and-dirty demos. So I took a look at Filecoin, which is branding itself as the web3 storage option (starting with its excellent domain name, web3.storage).

The first thing I noticed was that Filecoin’s standard library isn’t provided terribly convenient form for for my purposes (i.e. it’s not available as a standard library via cdn) which means it needs to be consumed via require and bundling. There are three standard ways to provide javascript libraries:

  1. Old-fashioned (consumed via <script> tag) — has worked since the 90s.
  2. Common JS (consumed via require) — common since around 2014.
  3. ESM (consumed via import) — supported by all browsers since 2018

Now, you can’t really provide Common JS modules conveniently without using some kind of transpiler, and all the decent transpilers are happy to produce ESM-compatible libraries, so there’s really no excuse. Also, I should note, require isn’t actually Javascript, but a hack.

Fortunately, you can just interact with Filecoin directly using the http REST protocol, so the only tricky part was remembering how to create the correct headers using the Fetch API.

Anyway, you can play with the demo component on the b8rjs.com website. But to do anything you’ll need an API Key, which you can get for free at web3.storage.

So, here’s the code I wrote that gets a list of the files you’ve uploaded to FileCoin:

const request = await fetch('https://api.web3.storage/user/uploads', {
  method: 'GET',
  headers: {
    authorization: `Bearer ${component.data.apiKey}`
  },
})
component.data.uploads = request.ok ? await request.json() : null

And here’s my code for uploading a file to FileCoin:

const outcome = await fetch('https://api.web3.storage/upload', {
  method: 'POST',
  headers: {
    authorization: `Bearer ${component.data.apiKey}`,
    'X-NAME': escape(fileInput.files[0].name)
  },
  body: fileInput.files[0]
})

So, within 2h or so I had an example b8r component that can list, display, and upload files on Filecoin.

A file record ends up looking like this:

{
  "_id": "315318962298078333",
  "type": "Upload",
  "name": "submarine",
  "created": "2022-08-31T00:50:44.742+00:00",
  "updated": "2022-08-31T00:50:44.742+00:00",
  "cid": "bafybeih5tmwvia5vacfiteh2p6yyrssbayowugngyl64ghdikc72beqnhm",
  "dagSize": 3063854,
  "pins": [],
  "deals": []
}

And you can via the file by assembling a url from the cid. In this case the url ends up looking like this:

https://bafybeih5tmwvia5vacfiteh2p6yyrssbayowugngyl64ghdikc72beqnhm.ipfs.w3s.link/

And it “just works”:

A test animation of the bio-mechanical alien submarine from Manta 2010.

Does this make a lick of sense?

When you sign up for an API Key, you get, in theory, a TiB of free storage (today I learned that TiB and GiB just mean binary terrabytes and gigbytes vs. the rounded-down decimal approximations), and can fill in a form to ask for more (haven’t tried that yet). But that’s for introductory pricing.

When you upload a file, it gets replicated among various nodes with the goal of being stored redundantly on at least five. This is obviously better than what happens with Ethereum contracts where every time you store data in a contract the transaction (i.e. the data) is replicated to the entire network. Now there are lots of reasons not to store all your data on the Ethereum network starting with (a) it’s public and (b) transactions are by intention about 15s long, but it’s pretty clear that Ethereum contracts are intended to be (a) small and (b) simple.

More importantly, who is paying for FileCoin storage and is there any profit in providing it? This is such an important question that FileCoin devotes a page to discussing exactly this question. As far as I can tell, the idea is that you’ll buy datacap with FIL obtained from a notary and that storage providers are compensated for providing storage at 10x the value of the storage they provide (i.e. more than covering the cost of 5+x redundancy).

So, and I’m just making this up as I go along, if I compute the cost of storing 1TB of data by taking 7TB of storage (I can buy an 8TB Seagate drive for $200) and running it for 5y @10W is around 440 kWH @$0.25… gives me $110.

This gives me a cost of ~$300 for what I’m suggesting would be a ballpark cost of storing 1TB of data with 5x redundancy for 5y at scale.

Storing 1TB of data on 5y costs $1400 on Amazon S3 and $1200 on Google Cloud Storage. That’s no actual traffic just storage.

So theoretically, the equilibrium cost of storage using this distributed storage model should be a slight markup on the cost of storing data at scale (not including redundancy or availability because that’s no longer your problem), because you can buy FIL or you can provide storage and earn FIL at better than the cost of consuming storage (i.e. it’s cheaper to use storage by provide it that just by consuming it). In other words, if you feel storage costs are too high, just become a storage provider as cheaply as you can and earn storage credits for reliable storage at the cost of providing unreliable storage.

To put this in concrete terms, if you’re the kind of outfit that has high enough storage requirements to install your own racks somewhere, you can convert that into distributed, highly-available, and essentially bullet-proof storage simply by sharing your storage at whatever it happens to cost you multiplied by, say, 7—and this washes out to less than a quarter of the cost of using cloud storage in data-centers.

So, yes. This makes a whole lot of sense—at least for files you don’t mind being public.