The importance that Sun Microsystems itself assigns to Flash can be judged from the appearance of Sun founder Andy Bechtolsheim on stage this morning for a keynote about Flash. Sun will have a new product in that space with enormous capacity soon. Bechtolsheim summarized the traditional benefits offered by Flash and its increasing role as a mediator between RAM or caches and the hard disk.
Flash also popped up in many other talks that I'll cover at the proper moment in this wrap-up article. I'll presents my notes on:
- Differing paths to blinding speed (and other hardware discussions)
- How to fumble your way to winning the presidency
- MySQL seeking its place in the cloud
- Some large-scale projects
- Follow your community
One of Wednesday's corporate keynotes, by Vijay Karamcheti, focused on the use of Flash in Virident Systems' data storage appliance. Karamcheti apologized sarcastically that he would sully his keynote with actual technical content, and I can affirm that his talk had less of a shiny veneer and more of a hardwood underpinning than the previous day's keynote by Bruce Armstrong of Kickfire.
What I found interesting is the differing paths the two appliances took in pursuit of achieving performance far above what could be reached by any general-purpose hardware. Virident ripped out the guts of the storage media, whereas Kickfire focused on speeding up queries.
I mentioned that the newer server Flash installations generally put it between RAM and disk as an extra layer of cache. In contrast, Virident ripped out the traditional filesystem, with block storage, SCSI drives, and so forth, replacing it with a four-tier storage:
- NOR Flash (closest to the processor)
- NAND Flash DRAM,
- Phase Change Memory (PCM)
- NAND disk storage
They claim that this architecture permits in-memory access times using persistent storage.
Update from Virident, April 26: Staff tell me that their current GreenCloud architecture is based on placing NOR Flash memory between DRAM and disk. A future system will include NAND Flash and PCM. The Flash solution is a form of storage class memory, a concept they describe in a white paper. The large Flash cache can often hold an entire database. Not only can data be delivered from the cache more quickly, but it can be in smaller chunks than the disk blocks that are transferred on disk access.
Kickfire's biggest gain was achieved by compiling queries into hardware. A custom chip accepts strings of SQL (the MySQL flavor, in fact) and executes them against data stored in the appliance.
Flash came up one more time in the talks I attended, a session by Trond Norbye about memcached. When a server has a Flash device, memcached maintains a two-tier cache with the keys and the most used data in main memory and the less frequently used items in Flash. memc memcached has been enhanced with two threads devoted to paging in and out between main memory and the Flash device.
Conference attendees who showed their loyalty by staying to the end of the conference were rewarded not only by cheap ice cream but by a really cool presentation by half a dozen Democratic Party volunteers and Blue State Digital staff who headed the information technology efforts of the Barack Obama campaign. They described how they set up MySQL databased, collected data from a wide range of party sources in very crude formats, ran analytics, and variously.
What was most impressive about this presentation was not the unrelenting determination these guys invested in snaring their data, nor the continual ingenuity they displayed in overcoming MySQL's foibles or their own. What was most impressive was their candor in admitting how much they needed to learn. There was no attempt to hid mistakes.
At the beginning, for instance, they stored data in MyISAM tables when InnoDB would have been much better. Only when table locking made backups collide with background jobs did they switch. Other novice errors included inserting test data into a production database and forgetting to remove it; the result (amusing in hindsight) was to throw off auto-incrementing and slow down simple COUNT(*) operations to the point where timely data was impossible to get. They also fed imprecise data to the analytics team because they took it from a slave that was slower than the master and thus lagged behind.
I don't want to make fun of these wonderful examples of public activists. I'm sure I would have done worse in their circumstances. Every potentially fatal misstep was fixed by a clever recovery. The key take-home is that when all was said and done, their efforts won the election. A good dose of luck was multiplied by intrepid decisions and lots of sweat. As today is the day traditionally celebrated as Shakespeare's birthday, I am moved to say to them:
...you have shown to-day your valiant strain,
And fortune led you well.
I also kept thinking during the panel, these are the guys entrusted with a presidential campaign? Why wasn't it run by the equivalents of Percona, the consulting firm that had the highest profile at this conference and whose staff wrote the latest edition of High Performance MySQL?
My answer is that nobody really thought of doing these things before, so the task fell to those with cool enough backgrounds to think of doing them. Come the 2010 elections, all political parties will be doing them, and they will be run indeed by the equivalents of Percona consultants.
Amazon.com has offered MySQL databases in its EC2 cloud service for some time. Cloud services fill the bill quite nicely both for storing data and for running database engines. But the keynote panel on cloud computing kept pretty low to the ground, each panelist mostly rushing to explain cloud basics and show their benefits and drawbacks. I won't summarize the points because they're widely available. I particularly like the blogs on the O'Reilly site by author George Reese, and the book he just released named Cloud Application Architectures.
Although openness has some meaning in cloud computing, I was annoyed that the panel played with the idea of open APIs. What could they mean by asking Amazon.com to open its API? Anyone is free to re-implement an API. There used to be "look and feel" lawsuits, but they became passé even before disco. Perhaps what they want is for Amazon.com to submit its API to a (not yet existent) cloud standards body. Such issues are addressed in my afore-mentioned blog.
Brian Aker and Eric Day presented Gearman, which started as a load balancer front-end for MySQL and evolved into a simple but flexible job handler. In its simplest deployment, Gearman can be used to carry out operations that take a long time when performed in the database. Instead of running aggregate functions such as COUNT or AVERAGE on the MySQL server, forcing other queries to wait, you can feed the data to a set of Gearman servers and run them in parallel. Gearman maintains a pool of worker threads and is smart enough to perform automatic failure recovery if a thread dies prematurely.
Gearman combines well with MapReduce or Hadoop. In this scenario, you feed a query to MapReduce client processes through Gearman, aggregate client results in another Gearman service, and then feed it to worker processes.
Some of the other fun uses for Gearman that Aker and Day suggested were "Make your own S3" and "Make your own CDN."
Another talk covered a curious little use of cloud computing called PBMS, for optimizing BLOB storage and retrieval. Barry Leslie started by laying out twin scenarios that sites want to avoid: storing BLOBS right in the database (requires a lot of bandwidth, and doesn't make best use of the database) and storing BLOBs in the filesystem with a filename in the database (because the actual location can get out of sync with the recorded location).
PBMS is sort of a storage engine, but uploads BLOBs to Amazon S3 instead of a database or local file. The S3 key and related location information are stored in the database. Retrievals start by getting the key from the database, then use HTTP to retrieve the BLOB.
Percona, which I already mentioned, pumped up the geek factor at the conference by running what was essentially a parallel conference from 9 AM to 10 PM each day in a large room off the mezzanine. Although I thought several of the regular conference talks were quite detailed and code-heavy, Percona founder Peter Zaitsev said he wanted even more hard-core technical content.
I have to admit that some of the Percona talks were beyond me, mostly because I missed the context of what the speaker was trying to achieve. But one presentation I found interesting was by Dathan Vance Pattishall about a site that hosts MySpace and Facebook applications. Both social networking services are quite sloppy about the demands they place on the servers that interact with them, in different ways.
MySpace applications, it turns out, use ten times as many front-end servers as equivalent Facebook applications, because MySpace user pages poll the server at the drop of a hat. For instance, MySpace constantly refreshes each user's home page to get status updates, and these refreshes poll each server whose application the user has loaded. MySpace requires a 1-5 MB transfer for something as common and basic as a Friend graph.
As for Facebook, Typical FBML processing takes 5 seconds, of which only 200 milliseconds is on the server. The rest involves translating results into something viewable by the user.
Robert Krzykawski and Anders Karlsson summarized the pros and cons of various high availability options for MySQL:
- Fault-tolerant hardware is a top-notch solution but extremely expensive.
- MySQL replication is very easy. But its asynchronous, so it might lose data (as we saw earlier for the Obama campaign) and the asynchronous copying makes failback notoriously difficult. Krzykawski pointed out that on his site, replaying the InnoDB log required slow recovery from a failure, precluding the solution from truly being called HA.
- DRBD or AVS are a step up from replication in terms of HA. These could be roughly compared to mirroring RAID, the difference being that they operation across a network. DRBD runs on Linux and AVS on Solaris. They're more complex than replication, requiring installation and administration of another service.
- ZFS on Solaris offers HA similar to DRBD and AVS. It's done by taking incremental snapshots to achieve synchronous replication.
- Shared storage involves sharing a partition instead of using replication. It's costlier than the simpler solutions and introduces a single point of failure, so it must be used with a SAN, which in turn introduces more costs.
- MySQL Cluster is another simple, self-contained solution that offers good performance. But it requires a lot of hardware servers.
I didn't have to take many notes during Cabral's talk because her advice to community members is exactly what O'Reilly tells our authors in order to promote their work and their careers. It also resonates with the advice of Ubuntu community manager Jono Bacon, who is writing a book called The Art of Community and maintains a sister web site:
- If you care about a project, volunteer for it.
- All kinds of volunteerism can be valuable. If you're not a crack programmer or master of another skill, fold T-shirts or hold a sign.
- Blog, tweet, or make whatever other curious utterances you feel comfortable making to promote the project.
- Don't participate for a reward; do it because you want the project to get better.
- Complaints as well as praise can benefit the project, when you find a legitimate problems. (I'm not sure that Cabral actually made this point, but Bacon does, and I'm sure Cabral would agree.)
Every project nowadays has a community around it--which is why O'Reilly advises its authors to busy themselves with their communities--but open source projects are particularly bound up with them. That's why it's significant that Sun manager Karen Padir promised the MySQL team would respond more quickly to community contributions. And this also ties together the whole conference with a statement Cabral made in addressing the Oracle purchase.
Cabral directly took on the rumors and fears of the MySQL community. Asking whether Oracle would maintain MySQL, she expressed confidence because Oracle respect and pays close attention to its own Oracle community.
This part of her speech did not persuade me at first, because I figured Oracle could nurture one community (around Oracle) while neglecting another (around MySQL). A conversation with Cabral today clarified her logic and revealed that she was concerned with a slightly different issue: the health of community and its effect on the desirability of a product.
Cabral pointed out that Oracle is the choice of many enterprises because of the richness of its third-party add-ons. If you want to ensure something complex such as HIPAA compliance, you can get a plug-in to do it and rest easy. MySQL has a third-party aftermarket as well (which I talked about in Tuesday's blog) as well as cool tools donated by the community--and of course entire storage engines done by volunteer teams--but still nothing of the scale Oracle offers?
To really succeed in the enterprise, Cabral told me, MySQL must catch up. And the main source of plug-ins and enhancements is its community. Being nice to one's community isn't merely being nice; it's guaranteeing one's future. If you nurture your community, they nurture you.
Now we can go back and interpret Cabral's praise for the way Oracle handles its own community. Because it understands the value of community, it can promote that same understanding--should it choose to--for MySQL. And then the virtuous cycle can begin.