The O'Reilly conference staff tell me we're nearly 2,000 registered attendees (up from the first estimate I got when I first posted this on Tuesday) for this year's MySQL conference, a very respectable number in an economy that's still sagging. I sense a bigger enterprise theme this year. The pride of putting up a PHP- or Rails-backed web site lies in the past; now people are concerned with scaling into the clouds (figuratively and literally) and ensuring absolute reliability. The schedule is bursting with advanced sessions in performance analysis and various forms of of distributed processing.
I came in on a plane yesterday and promptly took a long nap; I didn't go back on the grid until five thirty this morning and therefore had only a short time to process the purchase of Sun by Oracle. Apparently, the MySQL team didn't have much longer, because the first thing I did was visit the MySQL site to hear their reaction; the only reference to the purchase there was a reposting of the joint Oracle/Sun press release.
A perusal of news articles and blogs reveals that everybody has wildly different interpretations and predictions concerning the purchase. For every opinion one can find an expert with a diametrically opposed view. The only opinion I will offer is that MySQL is not a big enough part of the Sun package to be a major factor in Oracle's outlay. It may have been a strange Larry Ellison prank to announce the purchase on the first day of the MySQL conference, but it could also just be a coincidence and only the gods are laughing. What's important is that MySQL is more reliable, more popular, and more firmly based in its community than ever.
Evolution in the MySQL ecosystem
MySQL has been big enough for years to generate an aftermarket. For instance, before MySQL AB bought up a clustering solution, a number of companies filled that space, and some such as Continuent continue to do so. Zmanda offers enhanced backups, and I could go on and on.
One of the hotter organizations circling the MySQL nucleus is the open source Sphinx search engine. Targeting directly the full-text capability that the MyISAM FULLTEXT index merely glances, Sphinx has come up quickly in a space for which Lucene is known.
In addition to choice within MySQL (offered by different storage engines) and choice surrounding MySQL (offered by these other open source and proprietary offerings) there are several other open source database engines. PostgreSQL is fairly well represented at this conference.
I was chatting with PostgreSQL consultant Selena Deckelmann about where the database engine stood in relation to the enterprise features such as replication and clustering, which are not as interesting to programmers as PostgreSQL's rich SQL feature set but play such a big role at this conference. She explained that multiple solutions exist for PostgreSQL in these areas, but as separate projects.
The resurgence of InnoDBTwo or three years ago when I was discussing the outline for the second edition of High Performance MySQL with its authors, word was out that the days of InnoDB were numbered. How could it thrive after its purchase by Oracle? And after years of use, InnoDB's bottlenecks and inefficiencies were well-known to experts like my authors, and extensively documented in their blogs.
In a demonstration of open source software's nimble adaptability, a welter of new transactional storage engines started churning, developed by teams inside and outside MySQL. The buzz was all about which storage engine would be the new InnoDB: PBXT? Falcon? solidDB? Each had its technical points.
Well, InnoDB has come roaring back, and the people whom I pumped for opinions at the show say that the others are relegated to the role of promising future replacements. Although the some of the other storage engines demonstrate impressive benchmark numbers, none has reached the point of being a viable all-around replacement for InnoDB. When you consider all the needs of enterprises--logging, hot copy backups, replication and clustering, etc.--InnoDB has an unchallengeable lead.
And as the keynote today highlighted, the InnoDB developers have made incredible performance advances. Baron Schwartz, lead author of the second edition of High Performance MySQL, says InnoDB is still the top choice, which is why his company Percona chose it as the basis for their ExtraDB storage engine.
Development proceeds apace on some other storage engines, though, and someday enterprises will undoubtably have choices. I sat in on a talk by Paul McCullagh about the alternative that most bets are on, PBXT. He showed the changes he made to greatly improve scalability on benchmarks and summarized the challenges faced by database engine designers today:
- Multicore chips
Raw processor speed is no longer increasing, and McCullagh said the faster processor offered now by Intel is actually slower than its fastest 2004 chip.
- Solid state drives
I was surprised McCullagh put so much emphasis on the use of Flash for server storage, because my impression is that its use still hovers between the experimental and the cutting-edge. But it completely changes all the equations for storage, because all seeks take the same time and reads can be very fast. At the same time, non-sequential writes can be slow and the management of wear and tear gives rise to sophisticated data chunking techniques.
- Increasing RAM
Unlike processor speeds, this spec still tends upward. To take advantage of RAM, large caches should be used and should be pre-loaded with large reads.
Some of the responses to handle this evolution include segmenting data so that writes tend to go to the same pages in memory, working directly on cached data to avoid in-memory copying, and simplifying data structures because locality of reads no longer makes a difference in performance on solid state drives.
Threading the documentation
Technologies spawn web postings like salmon--even unpopular technologies. So when technologies get popular, the postings proliferate like minnows. You can never expect to find the best postings that responds a question you have.
The MySQL team was one of the more innovative organizations in letting readers contribute to the docs. Anyone could leave a comment on a doc page, and if it was apt the team would incorporate it into the page. This was pretty agile documentation in the days before wikis became popular,
At today's keynote, Karen Padir asked the audience whether they'd like the hundreds of pages of MySQL documentation to be put under the GPL; applause seemed to indicate approval. But I talked later to Russell Dyer, author of both editions of MySQL in a Nutshell, and we agreed that the license is not particularly relevant (even leaving aside the difficulties applying GPL to documentation). After all, what value does the MyQL documentation have apart from MySQL? A real advance would be an easier way for visitors to the MySQL site to edit or add to documentation.
An open license would allow authors to incorporate MySQL reference material into other print books, but it has occurred to me that the main way the GPL could make a difference is to facilitate forks of MySQL. What happens if Drizzle breaks away or if a group of developers have a falling out with Oracle (so far everybody is being very collegial about the merger)? I don't want MySQL to travel the course of BSD, but rumors are naturally brewing. If it must, different communities can share documentation.
Russell himself is trying to net more information by putting his book online at MySQueaL Resources, along with a huge database of other web pages such as the Planet MySQL aggregator and even commercial offerings such as Safari Books Online. By crawling these sites and indexing the text himself, he tries to provide a search site that comprehensively covers MySQL issues while excluding irrelevant and low-quality information. The site is powered by MySQL and Perl.