Real World Crypto 2014, Day 1 "separating the easy from the hard is challenging"

I'm sitting in my hotel in New York City exhausted and energized from the first day of Real World Crypto, which is being held at City College in Harlem. I'm delighted to be spending a few days on an island full of interesting people, and a little embarrassed to realize the last time I blogged was from a hotel room at the OSEHRA conference last fall, but tired and energized by the conference so far.
City College New York
RWC is a conference that aims to bring cryptography researchers together with developers implementing crypto in real world applications, and get them working together more closely. I think it's been very successful at that goal so far. Here are some of the highlights as I remember them from day 1 - please be tolerant of any errors in my reporting as I am far from expert in cryptography and I may not do the ideas justice.

In the morning I had the privilege of breakfast with Jacob Kaplan-Moss, who founded the Django project and is the Director of Security for Heroku. We had a fast-paced and wide-ranging discussion about how to reduce the effort for small healthcare companies to become HIPAA compliant, and why that is important for the spread of patient centered care innovations. He's also building a Raspberry-PI powered submarine which keeps shorting out, and we had some fun chatting about different types of information radiators in office environments - I'm still advocating for someone to connect a smell machine to a code quality metric so that checking in bad code cause the building to stink.

Later in the morning Dr Hoeteck Wee gave a very interesting report on efforts to formally verify the security of TLS, which is the protocol that is deployed to secure almost all communications on the internet and mobile phone apps. Because of the ubiquity of this protocol, this work was very important, even if it was coming a bit after the fact once TLS was already widely deployed. After all, empirical evidence does not equal proof. The definition of security used was data confidentiality, integrity, and entity authentication. Dr. Wee's conclusions were that TLS is secure even though it violates some basic cryptographic principles such as strict separation of the key exchange from the record protocol. He also concluded that the TLS protocol is fragile, and the security it has is accidental (depends on the client sending a finish message which was intended for authentication not for key exchange). After the talk a distinguished audience member made the comment that modularity is expensive and costs 2 round trips, which is .5 seconds of latency when communicating via satellite.

Brian Warner gave a fascinating account of crypto in Firefox sync, which has been through 3 complete redesigns. There was a huge amount to learn from this talk, and the biggest lesson was that bad UI (even when it's an accident) will totally ruin a beautifully designed protocol for all your customers. Firefox sync was originally designed for syncing data between multiple devices while keeping your data secure event against a compromised server - yet over half of the users had only one device and actually wanted backup rather than sync. The server-proof design meant that when a user with only one device lost that device, there was no way to recover their decryption key and so the data on the server was no good to them. I lost the link to Brian's full slides but his blog post is here with many details about the new SRP based design. I thought it was especially noteworthy that most of the protocol could actually be run in the clear without needing SSL/TLS, although the initial handshake does need TLS.

At lunchtime I had the honor of finally meeting Zooko Wilcox-O'Hearn in person, who I first met online when I was working on Ubuntu One years ago. We've crossed paths on twitter and IRC over the years, but never stood in the same room before. Zooko is now the CEO of https://leastauthority.com/ which provides highly secure and private storage based on Tahoe-LAFS (which he creates and gives away as free and open source software). I respect Zooko a great deal not only because of the products that he builds, but the integrity with which he reasons about what problems are important for the health of society and thus what problems he should work on. I was also delighted to meet Andrew Isaacson who was sitting at the same table, who is one of the founders of the Noisebridge hackerspace in San Francisco, and is currently very passionate about building secure multi-party chat (on top of his day job).

After lunch there were three sessions on Multi-Party Computation, which is a cryptographic technique by which it is possible to perform computations on data without any of the servers involved in the computation actually knowing the data. This is done by some math tricks which separate the data into chunks and apply transforms, then distribute the transformed and unreadable data to a set of servers who then perform some predefined operations and send the results back. When the results are combined and run through the matching transforms, the answer appears. It's a fascinating technique with many possible applications, and the principle thing limiting use has been how slow it is to run. Of the three talks, the one that stood out to me was John Launchbury who showed how they were able to accelerate shared computation to usable speeds by stepping back and re-thinking what data actually needed to be computed. John gave examples of working implementations of secure mail filtering (the server filters your mail without being able to read it), and secure voice conferencing (built on a modified mumble server). One potential application of these techniques is for research against sets of medical data obtained from multiple databases (such as multiple hospitals). You want the researcher to be able to compute an answer that pulls from multiple data sets but never actually combine the data sets in a way where a breach would compromise all of the hospitals. At one point a very useful definition was given: secure multi-party computation protocols emulate an incorruptible trusted third party in a world without trusted third parties. This is useful because you can design your system to rely on a trusted third party and then swap in MPC.

There was also a very interesting talk on mitigating server breaches using MPC, the idea is to split cryptographic keys among multiple servers, perhaps running in multiple hosting providers and running multiple operating systems with a frequent key refresh happening, so that the only meaningful breach is a breach of all systems simultaneously. This appears to be quite applicable for systems where you are encrypting data to be stored inside a database - by not storing the crypto key on the DB server or in the memory of the web servers, you have a system that is significantly harder to compromise. The comparison was made that MPC systems are a software way to achieve many of the properties of HSM, which made me realize that I'm very overdue for doing a deep dive on http://aws.amazon.com/cloudhsm/. This talk made mention of the claim that in 2012 there were $7 billion lost in HIPAA covered security breaches, and I would be very glad to see these techniques made available in open source form where they could be as broadly deployed as linux. Unfortunately it sounded like this was a crypto core rather than a product, and it was being built as a proprietary project where only richer companies would be able to enjoy better security. It's a good reminder about how important it is to fund infrastructure work like this that underpins our very ability to securely communicate.

Next was a talk on multi-linear maps, which are an extension of bilinear maps. The explanation given was that diffie helman is such a crypto gold mine because you can take values a1 and hide them in gai, and some tasks are still easy in this representation (any linear/affine function), while others are hard such as quadratic functions. Bilinear maps are similar in that quadratics are easy but cubics are hard - multi-linear maps are the result of asking "why stop at 2?" and so they extend the concept to k-linear. There was definitely a point at which things break down, but I didn't manage to catch what the practical limit to dimensions was.

Finally we heard from Dr. Shai Halevi from IBM research on some very interesting work on encrypted search. This work has been funded by IARPA in order to facilitate the intelligence community searching very large data sets without giving the searchers unrestricted access to the data sets. It was intriguing to see that these techniques could potentially be used for such things as warrant enforcement, where they could result in dramatically reducing the amount of data that is disclosed in a warrant based search. Dr. Halevi made the amusing observation that the effects or significance of statistical leakage is entirely dependent on the application - kind of like saying that the effect of a bullet depends on if it hits anyone. He also called for a theory of leakage composition so that we might be able to more usefully reason about differential privacy.

It was an amazing first day, I'm still digesting much of what was presented. I'm particularly grateful to Pat Deegan PhD & Associates for funding my travel to this conference, and recognizing how important it is to participate in discussions about modern crypto and it's application in the real world. It's been worth the money so far, I'm buzzing with new product ideas and have several areas for further research. You can be sure I'll be applying what I've learned to our products at PDA to keep the patient data we are entrusted with as secure as possible.

P.S. I decided today to stop slacking and finally start working on a book about achieving HIPAA compliance in a small healthcare startup. Want to help keep me motivated? Sign up here for occasional updates and preview content from the book.