Dropbox: Deduplication with Privacy

There’s been a bit of a scare regarding Dropbox related to the possible use of deduplication to determine who has copies of “illegal” files and then the use of warrants to identify infringing Dropbox users and basically hose them.

The problem

When you store a file on Dropbox it will be hashed (more-or-less uniquely identified by scanning its content) and then the hash and the file’s size will be used to determine if the file already exists on Dropbox’s server (i.e. if your ripped copy of Avatar matches someone else’s it will have the same hash value and the exact same file size). If so, rather than uploading the file your account will simply get a new file entry pointing at the existing file. “Upload” is instant, Dropbox saves money on storage, everybody wins.

But, suppose James Cameron uploads a ripped copy of Avatar to Dropbox and notices that this 3GB MP4 file uploaded instantly. He now knows someone else has such a file on Dropbox which is reasonable cause to suspect that piracy is happening and, in theory, he can require Dropbox to tell him everyone who has a copy of that file in their account.

Hence the scare.

The obvious solution to this problem is to not knowingly store illegally duplicated files in your Dropbox account or to encrypt them using your own unique key if you do.

But it’s quite possible that any of us might accidentally put an illegal file — or perhaps a file normal people consider “fair use” but the MPAA (say) might not consider legal — in your Dropbox account. E.g. I might rip Avatar using Handbrake so I can watch it on my iPhone, and this might create an identical file to your handbraked copy of Avatar, and according to the MPAA we might both be horrible criminals who deserve the gas chamber and given that Congress only cares about people who provide large campaign donations…

A possible solution

I’ve proposed this solution on both HackerNews and DropBox’s forums. It’s not perfect — maybe someone can refine it.

I imagine Dropbox has a list of files with unique ids, sizes, and hash values, and every user has a list of files with their own personal path (where they think it is and what they think it’s called) along with the unique id of the actual underlying file. This is the heart of the problem.

Instead of storing the unique id of the underlying file in the user’s file table, Dropbox needs to store a number offset by a hash value generated client-size from the user’s password and the user’s name for the file (i.e. something that will be different for each user and each file and not replicable with data stored in Dropbox’s own database).

Note that if the user’s password is changed then every file id will need to be changed accordingly, which is definitely a downside. (And if you forget your password then your files cannot be recovered.)

Also note that presumably someone like the MPAA could simply obtain a warrant and wait for people to access an “illegal” file, but this is surely going to be a much slower and more difficult process than simply doing a query on the entire database and sending out threatening letters to everyone in the result list.

Thing is, this isn’t technically complex  to implement and could be a user preference. Would you prefer privacy with the risk of losing all your files if your password is lost? Given that you will probably have multiple backups of all your Dropbox files, it’s actually not a big problem. (In fact, if you consider the case where you are forced to reset your Dropbox password and thus Dropbox forgets you own all your files — re-uploading them from one of your computers will be instantaneous for all the files you previously had uploaded owing to deduplication.)

Edit: another problem with my proposed solution is that you can lose track of files (e.g. you can’t maintain an accurate reference count). This is probably not as big an issue as it might seem since Dropbox already retains files for a month after a non-paying user deletes them and forever for paying users. Presumably it retains copies of files left by users who stop using the service.

Final Note: I have no affiliation with Dropbox (although I do use the product) and have no stake in it. If you’d like to try Dropbox and give me more space to store potentially illegal files, please use this link.

It’s not a sale, it’s a license

From the Wall Street Journal:

Under most recording contracts, artists are entitled to 50% of revenue from licensed uses of their music. That usually means soundtracks for movies, TV shows and ads. Sales, on the other hand generate royalties for the artist at a much lower rate—generally in the low teens, and rarely more than 20%.

And also:

…the Ninth Circuit held that iTunes downloads (even the DRM-free variety) are encumbered by enough restrictions that they can’t be considered sales at all.

Fair enough. Now may I observe that since a “license” doesn’t physically exist, and in particular need not travel from the vendor to where-ever I happen to be that my purchases shouldn’t be subject to sales tax.

Also, if my computer explodes my licenses remain in effect and I should be able to obtain new copies of the actual tracks for free or at most a nominal price.

When will the law catch up to, say, 1975?

HTC Getting Desperate?

HTC has countersued Apple over cellphone patents claiming Apple infringes on several patents, including some it managed to acquire from a known patent troll. But a quick look at the documents reveals just how lame the substance of this suit is.

There are five patents involved.

The first patent mentioned (as well as the last, 7716505, which is not yet up on Google) — the only patents HTC created itself — deals with power management in smart phones, specifically HTC claims to have invented some clever new way to manage the power of the “phone” part separately from the power of the “smart” part. My guess is that this is all bleeding obvious and Apple has prior art up the wazoo (as do lots of other companies). Simply from the patent itself they don’t seem to be doing anything more sophisticated or different from what any halfway decent laptop manufacturer does when deciding whether to dim a screen, spin down a hard disk, or put the system to sleep.

The other three patents (5541988, 6058183, and 6320957) all have to do with representing a user telephone directory in a convenient form and allowing people to dial by clicking on things. HTC even claims Apple is forcing third parties to infringe on the patent with its human interface guidelines. Not only is all this stuff bleeding obvious — we’re basically talking about someone trying to patent the idea of a database, but only if it consists of phone book entries — but the earliest of the three patents was issued in 1996, several years after Apple shipped several products with non-trivial telephony support (i.e. the Newton, which had a dynamic phone directory and a tone dialer built in; and the AV Macs, which could function as PABXs and fax machines via the GeoPort). How do you think Apple displayed user phone directories and let people dial numbers? Also bear in mind that PIMs (such as ACT! and Now Contact!) were one of the hot product categories in the early 90s, and the idea that this patent added anything none of these other (shipping) products thought of is pretty far-fetched.

It seems to me that the Patent Office ought to be able to fine people for wasting its time.

Good v. Evil

It’s not like Apple is a patent-troll shell company that needs these suits for income. It’s not like the government is going to shut down everyone’s Android phone already in the market. It’s purely an anti-competitive suit. Which, for consumers, just limits innovation.

Business Insider, Apple’s wimpy patent suit is proof that it’s terrified of Google

I have news for you, Business Insider. Patents are anti-competitive. That’s why they exist.

There are several questions raised by Apple’s lawsuit against HTC. Is it right or wrong? Can Apple win? Why is Apple doing it?

I have no clue about the last two questions, but like everyone else, I have strong opinions on the first.

John Gruber thinks it’s all about Steve Jobs being emotional:

“We can sit by and watch competitors steal our patented inventions, or we can do something about it. We’ve decided to do something about it,” said Steve Jobs, Apple’s CEO. “We think competition is healthy, but competitors should create their own original technology, not steal ours.”

That’s not the language of a licensing dispute or the beginning of a polite negotiation. That’s the language of a man aggrieved.

I think he may be right, after all from an emotional standpoint it seems to me that Apple is, again, being ripped off by competitors. While what Apple’s imitators are doing seems wrong (and after all, the “emotions” we’re talking about come directly from our sense of “justice”) it seems to be perfectly legal, right? Well, surely if it seems wrong and it may in fact be illegal, doesn’t Apple have the right to try and find out?

Fortune (Steve Jobs: A Man Aggrieved) quotes Paul Graham as saying that Apple is in danger of becoming Evil:

“Apple is inching ever closer to evil,” writes Y Combinator’s Paul Graham, using the word in Google’s low-bar don’t-be-evil sense, “and I worry that there’s no one within the company who can stand up to Jobs and tell him so.”

Now, the funny thing here is that I don’t think Apple is anywhere near Evil in any Moral sense (and what other sense is there?), but it may be being Evil in using a broken patent system in an attempt to pursue its idea of natural justice. But how is that different from using the Tax Code to throw Al Capone in prison?

I’ll finish off with words from Wil Shipley (who himself has been victimized by Apple — note, I don’t think Paul Thurrott speaks for Delicious Monster):

If Apple becomes a company that uses its might to quash competition instead of using its brains, it’s going to find the brainiest people will slowly stop working there. You know this, you watched it happen at Microsoft. Enforcing patents isn’t a good long-term play: it’s the beginning of the end of the creative Apple we both love.

It’s a nice sentiment, but seems to be completely without factual basis. Did the brainiest people slowly leave Microsoft? If so, was it because Microsoft used its might to quash competition? It seems to me that Microsoft’s problem has never been a lack of brains, but a lack of taste. AT&T was, for many decades, a powerhouse of innovation, built and maintained by quashing rivals and enforcing patent laws. I seem to recall similar admonitions that Steve Jobs’s obsession with secrecy would damage Apple’s culture of creativity.

Free VP8 vs. Ogg/Theora

Ogg Theora seems to be really good at handling super-saturated images of vegetation

The Free Software Foundation has suggested to Google that it might want to make VP8 “irrevocably royalty free” (whatever that means—will it cover future developments of VP8?). Of course at the same time they argue that Ogg/Theora is as-good-or-better than H264 and of course it’s both open source and royalty free.

I’m not sure if I buy the quality argument (for a start, the article linked is almost entirely about image quality and not, for example, about compression time/effort or playback time/effort; next, Ogg/Theora may be very good at handling supersaturated images of vegetation growing near a river, but how does it do on more realistically saturated fleshtones? Most lossy codecs sacrifice blue detail in particular, so this may be a boneheaded comparison), but certainly Ogg/Theora is at least in the ballpark, and it’s free, so what’s the advantage of having an “irrevocably royalty free” VP8 as an alternative?

There are two major reasons why the world hasn’t embraced Ogg/Theora.

First of all, there’s lack of hardware support. Silicon that will decode H264 (not that desktop PCs or even laptops actually make much use of this hardware) with virtually no effort (for which read battery power) is available free with a box of cereal. No such hardware exists for VP8 or Ogg/Theora.

Let’s look ahead five years or so. We’d presumably like for there to be silicon that decodes our favorite video format free in boxes of cereal. We already have that for H264, is having such a chip available for VP8 going to make Ogg/Theora hardware support more or less likely?

Second, there’s the question of whether Ogg/Theora is actually unencumbered. H264 is, in essence, a known quantity. It’s backed by a huge bunch of megacorporations who have pooled their patents; even if someone comes along who claims to have a vital patent covering some obscure nook of H264’s implementation the participants in the H264 patent pool have lots of money, lawyers, and patents to with which to defend themselves (and they’re juicier targets than we are).

Ogg/Theora hasn’t been sued because no-one is making significant (any?) money with it, which in turn means there’s no reason for someone who thinks they could sue over it (a) to do so, or (b) even to let it be known they think they could do so (which is condemnation enough of our patent system).

If this were 1990 and we were talking about image file formats, “GIF” would be unencumbered. The GIF lawsuit only happened when someone figured there’d be money in it, and that was only true when lots of people became invested in using GIFs.

Here, we assume, VP8 has a huge advantage over Ogg/Theora because, presumably, Google has money, lawyers, and a patent portfolio. But unless VP8 is open sourced, I really don’t see the point. We’ll just be trading one master for another in the long run, since either VP8 won’t continue to evolve and will need to be replaced in fairly short order or it will and then it’s up to Google to figure out whether the new version is covered under “irrevocably”.

Addendum: really nice article on the whole Flash / VP8 / Theora / H264 / HTML5 thing (courtesy of daringfireball as usual). And yup, Theora’s claims to superior quality are a Sad Joke.