You may also download this file. Running time: 00:17:14
MySQL has had a long and sometimes strange journey from an independent database project to being commercialized; then brought to Sun and now possibly moving to a new home again. Brian Aker is the director of technology for MySQL with Sun Microsystems and probably is familiar as anyone with the life history and current status of the popular open-source database. He'll be speaking at the MySQL Conference in late April on a number of topics, and he's joined us here today. Thanks for joining us.
Brian Aker: No problem.
James Turner: So why don't you start for those who aren't familiar, what has been the life history of MySQL at this point?
BA: Well, MySQL began a little over a decade ago when Monty Widenius had decided to take the database he had at the time, which was called Unireg, and decided to add a client-server protocol to it and bring SQL to it. Before that it probably resembled something more closely to Berkley's DB than anything else. And since that, it's grown historically for about ten years, adding more and more advanced features along [the way], going from initially a database which had a somewhat open license to eventually being fully GPL.
JT: So it's now under the wing of Sun when they acquired MySQL. Obviously, you've heard the rumors, I'm sure as everyone else has, and probably can't comment too much. But if we assume that a certain large New York-based company were to acquire Sun, do you think MySQL would fit well into that world?
BA: Well, IBM actually has a project to run their storage engine that they use for DB2 directly under MySQL. At the same time, MySQL a couple years ago gained a bit of popularity on their I think -- what is it? The iSeries? I always forget the name. Basically, their mainframes. MySQL picked up a bit of popularity in that area. So I would actually think it would fit pretty well. I mean MySQL has been traditionally above just the point of just being a UNIX-based database or a Windows-based database. I mean it's pretty much fit on just about anything you can imagine.
JT: Now, IBM already owns -- I'm going blank here -- I think it's Ingres that they purchased a few years ago. It's either Ingres or Informix.
BA: It's Informix.
JT: Okay. It was one of those I databases. So would you see any synergy there?
BA: I really wouldn't know. I haven't seen Informix for -- oh, it's been a very long time since I've actually taken a look at it. With the work IBM did with DB2 and MySQL, obviously, there's some synergy there. But really, I think I last used Informix somewhere back around 1997 so I haven't really had any experience with it for a long time.
JT: I get a feeling it's almost like "All your databases are belong to us" at this point and it's going to acquire them all. So another thing that happened this year since we talked to you at OSCON is that Monty has left at this point, as I understand it.
JT: Can you talk a little bit about what's been going on in the community and what schisms there may be exist at this point?
BA: Well, at this point, the most interesting thing that's been going on in the community has been kind of a reawakening around the issue around user contributions. Mainly, we've seen much more of a push towards taking user contributions. There's some attempts on the MySQL side to find a way to bring in contributions. When we created the Drizzle microkernel, which is a fork of the main kernel, one of the first things we decided to do was try to find a way to really bring in more community involvement. I think last month we had something like 67 outside contributors to the project. Even in December, I think we had something like 48 or so contributors. My numbers are probably off by one or two, but we've seen a lot more interest in companies providing developers and just a stronger interest in helping shape the future of the database.
JT: Do you think that in most open source projects you're going to get this eventual, almost like sheep farmers and cattle farmers, thing between the people who want higher performance and the people who want more features?
BA: There's always an inevitable -- no open source project is complete until it has a mail reader built-in, I think was the old joke. So there is always an issue of pushing in more features and so forth. I think that inevitably leads to some trees that solve this problem; some trees to solve that problem. In my own world, that was one of the reasons why when we started refactoring MySQL, we decided to go with the microkernel architecture so that we could actually take any form or any piece of -- you don't want any kind of authentication? Fine. We won't load an authentication module. You need PAM authentication? Fine. We'll load a PAM authentication module. So to me, that's kind of the natural evolution of open source--actually towards more microkernel-type architectures. Certain things I've seen recently about refactoring about Open Office and other projects, I think, kind of shows that long-term open source projects that tend to pick or tend to evolve eventually towards being a much more modular design tend to actually last longer.
JT: Right. Well, that's certainly the direction the Linux kernel took. So I assume that means that we're not going to get "SELECT * FROM GMAIL WHERE SENDER =" any time soon?
BA: [Laughter] It wouldn't surprise me at all if somebody hadn't at least put something together like that as an example.
JT: Right. Because I know there have been extensions that allow you to do pseudo tables out of operating system stuff.
BA: Oh, yeah. We've had pseudo tables out of operating system stuff. We've had tables that have come out of, say, Amazon's S3 service. There was an example a couple of years ago where I saw somebody selecting data out of a Google spreadsheet. Inevitably, people do all kinds of little kind of niche projects either just for saying, "Hey look, Ma, [see] what I can do," or they have a specific need or a specific want and they go off and decide to do that.
JT: Since you mentioned Amazon and it's kind of a hot topic, if you look at both Google's App Engine and not so much Amazon because you could drop a VM there with a DB if you want, but what they offer for native storage, both of them seem to be taking very much the attitude that hashing is a better solution for most people than a full relational database. And there seems to be a lot of push towards having the idea that you should use the relational databases only where it really makes sense. Is this a natural pendulum swing or is there more to it than that?
BA: Oh, I think this is actually a very natural swing. And I generally agree with it. If we look at what's been done in recent time, we've seen models go much closer towards object relational models. And in object relational models, for the most part, all you really need is some kind of hash index. I think it's inevitable that we'll see things, you know, other types of databases show up. I mean there's HBase; there's Hypertable. We've seen memcachedb created. All of these different projects -- what they are looking for is, "Hey, I've got some data. I have a key. I have some way of looking it up; let me just insert that data and let me get the data back." To me, the relational databases have always been really pretty awesome for basically metadata, series data and so forth. But I think it's kind of inevitable that we see other types of storage means come about. And key lookup is pretty much a bread and butter type thing for many types of web applications.
JT: And also if you look at things like Force.com, the model they've been pushing is a RESTful interface or a SOAP-based interface into there what is under the tables a relational database. Can you do relational DBs in the cloud easily?
BA: I think you can do relational databases in the cloud, but like anything, you pay for it based on performance. Either you pay for it [in the fact that] the cloud database will be able to store smaller sets of data or you pay for it in the fact that actually doing joins across multiple nodes is fairly expensive. And that's just kind of what you want to do. I mean we relax relational integrity release. We relax ACID in order to get higher performance out and it's just been kind of a natural piece. As bandwidth increases, as a number of pieces of technology increase, we can do more and more of that. But in the end, the closer you get down to -- the faster you want the vehicle to run or the faster you want your database to run, I guess you'd say, and your data lookups to run, you start throwing out things because it turns out sometimes you don't need them.
And if you look at really RESTful based databases right now, the most interesting one I found out there usually is CouchDB, just because there's a natural need for that ability to do RESTful lookups on datasets. So we see all of these extensions occurring to databases right now and all of these new forms of databases out there. It's kind of nice to see that people are opening their minds up a little bit as to that there's more than just one way to store data.
JT: You obviously have a lot of stake in the MySQL Conference, what are the big things that people should look forward to there this year?
BA: The big things I would look forward to is -- well, one, I think many people right now have been showing interest in our main project which is Drizzle, which is a reworking of MySQL towards both microkernel design and also towards going for much faster performance under multicore architecture. That's something I see that there's been significant interest in. I'm kind of really interested to see what's occurring in the world of InnoDB. Oracle just released a new plug-in which fixed a number of the performance issues that have been related to having InnoDB support multicore architectures. At the same time, we have Percona with ExtraDB which is focused a little more on providing more gear-head level like tuning perimeters. We see PBXT. There's still a lot of interest out there in kind of that next generation or just how are we evolving the basic storage engine. So that's some of the highlights I think of what we'll see at the conference.
JT: How is the relationship between MySQL and the folks over at Oracle doing these days? I mean that kind was an awkward moment when kind of a core piece of technology moved off into what could be seen as a competitor.
BA: I actually think it's turned out just fine. Oracle has continued to release InnoDB. We continue to see performance enhancements of it. They continue to fix bugs, support things. If there was any fear of Oracle just buying it and killing the technology, I think we're so many years past that now, I think that kind of fear should've already been shed by this point. And as I mentioned earlier, other folks have now been extending InnoDB. Google has been. Like I said, Percona. The InnoDB code base seems to be pretty alive and pretty healthy. So I wouldn't really be all that concerned about it.
JT: One of the concerns that a lot of people have about the new internet is that a lot of the rich information, be it in Flash applications or in the case of stuff locked up in databases, is both hard to search and very transient in some way so that it's hard to archive. How do you see the relationship between databases in both search and kind of long-term internet memory?
BA: So when we talk about databases -- well, let's extend this. Relational databases as far as supporting what we call full text or inverted indexes, that type of thing I think we've kind of moved past trying to shove that into a relational database at this point. We see open source projects like Sphinx. We've had Lucine for a very long time. I think there's been new types of storage means which we traditionally wouldn't have called databases. But those storage means have been -- we see them getting better and better and better. So as far as search goes, I think we'll see far more interest there.
What there is, by the way, which I find interesting, is that the amount of hardware required to actually find data on the internet or I should say to harvest data off the internet has come down in price. If you look at commodity hardware with commodity storage nowadays, there's a lot of room to being able to crawl the internet nowadays. And while the total amount of data actually has expanded, that core set of data, that's still the very valuable stuff that you want to search on, we see a world where I could easily see somebody popping in with a new search engine at some point instead of trying to go after the 80 of part of the 80/20 rule, they could actually go after 20 percent of the data and actually find and build really incredible searches. So I think we're probably still on the verge of someone actually being able to create another search engine soon.
JT: Right. One example I was thinking about in specific because I've been talking to some of the folks in the Science Commons, for example, is the problem when you've got these vast databases of, for instance, genomes and most of those are stored in traditional databases and trying to make sure that there is some preservation and easy access to it.
BA: Yeah. I mean there's lots of data that's -- I mean we still haven't really done a -- well, for one thing, we don't expose databases to the internet. That's just not been the common case so far to say now here's the database and expose it. Instead, what we've been doing is taking APIs to the system so you see Facebook or Yahoo. I did this even early on with Slashdot with the search system there of finding ways to try to throw data out there and throw it through APIs. The most common piece of that has been REST so far. But getting down to the nitty gritties of detailed data, we really just don't have a good means of doing that, of actually sharing data. Many people thought that XML was going to save the day. But I think it's now obvious to everyone that XML didn't save the day. It's great data exchange format, but it didn't save the day for solving these kinds of problems.
JT: I guess that's what the whole thrust of the semantic web is about is trying to get some organization of meaning to this stuff.
BA: Yeah. I mean semantic web continues to be something that gets thrown out a lot. So far it's never become anything concrete that anybody can really use or will shape our lives. Really, RESTful type APIs have been the closest thing so far. So I haven't seen anything so far emerge that actually makes me think that we're -- had any kind of convergence on any sort of final solution there.
JT: Brian Aker has been with us today. He is the Director of Technology for MySQL with Sun Microsystems and he will be on numerous panels at the MySQL Conference coming up on April 20th. We look forward to seeing you there.