A few words about Ubuntu One servers

I promised Matt Griffin I would talk a bit about Ubuntu One servers and some of the work that we've done in order to keep up with all the new users that have signed up over the last 6 months since Ubuntu 9.10 came out, and preparations for the growth that we expect with the launch of the music store and phone sync features in Ubuntu 10.04. I'll start by writing up some descriptions about the different moving parts that make up the server side of Ubuntu One.

Ubuntu One has many parts. All the parts on the client side are free software, and about half the parts on the server side are free software. There are two major components that are currently closed source - the django webservers that implement the web interface for https://one.ubuntu.com and the twisted servers that implement the server side of the file syncing protocol (https://launchpad.net/ubuntuone-storage-protocol). The django web servers include some code that we are contractually not allowed to release related to integration with the music store partner, they also include some code that we've been pleased to be able to factor into libraries and release on their own (such as wsgi-oops and desktopcouch).

Aside from the file syncing protocol, the other major two channels to Ubuntu One services are syncml and couchdb protocols. Syncml is used to support syncing of contacts from mobile phones, and that server code is open source (http://funambol.com/), and the couchdb replication protocol is used to support replication of bookmarks, Gwibber messages, Tomboy notes (sort of), Evolution contacts, and just about any other application that cares to integrate with desktopcouch. If you are an app developer, the quickly project and the desktopcouch library have some really cool recipes for easily cloud-enabling your application. All the CouchDB server side code is open source as well (http://couchdb.apache.org).

Out of all the stuff in Ubuntu One that I find interesting, I'm most proud of the way we are using CouchDB, because this technology does so much to both preserve user autonomy over their data while also providing the convenience of replicating data through what could be called a personal cloud. If Ubuntu One goes away forever, all the data you have in CouchDB continues to work just fine, and all the applications integrated with desktopcouch continue to work just fine - you could even easily set up a separate CouchDB cloud and point all your machines to replicate to it instead of the Ubuntu One servers. For people who don't feel like setting all that up, the apps will work out of the box with an optional Ubuntu One account. This ability for application developers to make use of a local data store that can automatically replicate if the users decide to enable Ubuntu One is something that I am convinced has huge potential for making users lives better without making them totally locked into Ubuntu One or any other service provider.

Finally, we have many somewhat boring servers running the standard things you run on any moderate-to-large web application: apache2, rabbitMQ, postgresql, squid, memcached, ha_proxy, iptables, nagios, etc. We've gone to some lengths to try and make sure that there are redundant paths to access the webserver farm even though an average page load may touch apache->squid-haproxy->django-memcached->postgresql. For every server that we run, we try to make sure and have several smaller servers running rather than a single big server, so that we can scale horizontally if at all possible, and do upgrades without taking the entire service out. And, 'service' is not a very good description, since we can update phone sync servers without taking down the file sync, bookmarks, music store, and tomboy notes. We have not yet split across multiple data centers, but are drawing up plans so we will be ready when the time comes.

There are two big changes that we've made on the back end that are not very visible to users but are still important. As of today, the biggest database we have is the one that helps keep track of the files you have stored in Ubuntu One, and that is now split across multiple 'shards', meaning that the data is partitioned so that even if a database goes down only some of the users are affected, not all users. This also lets us decrease our MTTR, or mean time to recovery, as well as improving performance of both the web site and the desktop file syncing client. We're also putting the finishing touches on partitioning the CouchDB system, which has many many small CouchDB databases replicated from each users desktop. Partitioning or sharding here accomplishes the same goals - don't allow the whole service to go down even if a server fails, make backups easier and faster, improve performance by scaling horizontally.

Another change that went live today is the new 'dashboard'. If you have an Ubuntu One account and login to the website, rather than immediately being directed to the files view, you are now shown a dashboard that provides more of an overview of what you have stored in Ubuntu One. Hopefully this new dashboard is more informative, it is also significantly 'lighter' and cheaper to render than the entire files view. Here is a screenshot:

We are continuing to develop features in public and might have a few more surprises coming before the Ubuntu 10.04 launch. If you want to try out the very latest code, we deploy new versions every hour to http://edge.one.ubuntu.com, and are always interested in feedback on new features that you see there.

I hope this was useful - if anyone has questions about Ubuntu One, I'll do my best to answer in the comments or perhaps write a new blog post if lots of people want to know about the same thing.