In the '90s, when the web was new and interactive forms were exciting, Perl and CGI were a natural fit for dynamic websites. mod_perl's release in 1996 sped up Perl programs by keeping them persistent, but grew to be a way to write Apache httpd modules in Perl. There was no single, simple way to run and deploy persistent Perl web applications in the easy upload-and-go fashion epitomized by PHP. Many people saw that. Byrne Reese and Aaron Stone did something about it. Their project, mod_perlite, is one of Five Features Perl 5 Needs Now.
Why do you say that CGI is dead?
Byrne: That's partially designed to be an inflammatory and provocative statement. I don't think CGI is dead. I think CGI is a technology and protocol that is well-suited to a certain set of problems. It's just not suitable to large scale, easy to deploy Perl applications.
I would like to say that CGI, over time, has become somewhat intricately linked and associated with Perl and it has become hard to disassociate the two. In fact when people talk about CGI, they are almost always talking about a Perl application that they have written that they're trying to deploy on a web server. But CGI was designed under the premise that you can execute bash, shell, and C-shell, whatever shell scripts you have. And CGI is perfectly suited for executing them. I just don't think that is the predominant use case anymore.
To some extent, I think it's hurt the growth and adoption of Perl. Furthermore, solutions that have emerged to try and address some of the limitations of CGI have grown too complex to solve the 80 percent of use cases out there: which is stated simply as, "I need a stateless and fast container to run my Perl script.
Aaron: With just upload and run simplicity.
Aaron: That's something that competing languages -- well, not always competing, but other languages out there in the ecosystem that have the "upload and run" capability that is really tremendous for the long-tail of individual, personal and small business deployments, not to mention the experimental deployments within a company where you have the secret engineering web server where engineers have the liberty to test out new applications without having to devote so much time that their managers might notice they're doing something besides what they've been assigned. "Upload and run" simplicity is important for those skunk works projects where a lot of new ideas can be explored quickly and easily with Perl.
Byrne: Aaron is pointing out, I think, an idea of skunk works projects within the context of a company. But for a lot of engineers CGI just presents too many hurdles and requires too much access on a machine to get up and running quickly, which is certainly true to some extent for Movable Type.
When I was the Product Manager of Movable Type, I was working on a project to improve the performance of Movable Type during which we went to great lengths to look at how the app itself was contributing to performance overall. But I was also interested in looking well beyond the app itself and looking at all of the vectors that contribute to poor performance and to some extent, declining adoption of Movable Type by the individual.
One of the reasons WordPress has done so well in the competitive landscape of blogging is that it's PHP. PHP lends two major criteria to its adoption -- first and foremost is ease of installation.
If PHP is installed on a server, that's the last thing you ever have to do to make sure a PHP script runs. You can drop a PHP file in any directory on the web server and you can serve images out of any directory on the web server. That's it. You're done. You don't need to go to a system administrator to configure your machine properly to serve images out of a directory configured to also run a PHP script, something you frequently need to do with CGI. The barrier to entry for hacking has been greatly lowered and reduced by PHP.
Aaron: Hacking in this context referring to engineers hacking the skunk works projects mentioned earlier.
Byrne: That's one of the things this project hopes to solve. In fact, PHP was honestly the inspiration behind mod_perlite. We would've called it mod_perl, as is the custom for these types of modules, if that name hadn't already been taken.
Aaron: Apparently, a very robust, enterprise-capable solution is under this mod_perl name. It's really an excellent tool and Byrne and I both have the experience of running pretty much the entire stack of off mod_perl. There are a number of organizations that are running very intensive applications on mod_perl. It's really an incredibly useful, powerful and excellent environment to run an application in. But for upload and go, it leaves too much power readily accessible to the application runner.
You end up being in a situation where you have to ask yourself, "do I trust
the project -- say the open source project that I just downloaded this
application from -- not to have any memory leaks, not to have any special
configuration requirements that I then have to get the access to go and setup
in each conf file for Apache? Do I need upload access to an .htaccess file in
order to set local variables using the
PerlSetVar directive in
order to get my database configuration string in there because the application
won't read out of a config file? Those are the type of situations that for the
upload and go use case, mod_perl does not excel at all.
Byrne: It's one of the reasons why shared hosting providers have, for the most part, banned mod_perl from their shared hosting accounts. One bad apple can really spoil the entire barrel. If I have one amateur developer or simply someone who is running a very popular web application, they can take down my entire server and all of the customers on that server. That's a risk to me as a business. The answer to that for these shared hosting providers is simple: forbid mod_perl, and that's very unfortunate. That's very unfortunate for any Perl application that is striving for wide adoption in the marketplace.
Tell me if I'm putting words in your mouth, but are you aiming at like the 80 to 95 percent of people who won't ever run the next Amazon.com or Vox.com but say, "I have a small application and I want to play it on my $5.00 a month host"?
Aaron: You got it.
Byrne: The truth is, if you endeavor and have the ambition to build any large scale application, you're going to outgrow very quickly an architecture that mod_perlite or even mod_php is appropriate for. You're going to need to expand your network, and you're going to need to architect your software fundamentally differently. But for people who just want to get started, you need something much easier to get started on.
Aaron: On the note of really large applications, take a look at the largest PHP sites out there -- and I'll have to think for a moment to really think who's the biggest.
Let's say Facebook.
Aaron: What they're doing in their architecture is using PHP as the front-end to everything. Their internal architecture, from the bits I've gleaned and gathered from architectural presentations by some of their engineers, is a service oriented architecture where the front-end PHP application makes calls into all of their back-end applications. They have their search applications, their image lookup application, a file lookup application, a ratings, ranks and the social network graph application, the who are your friends application and so on and so forth. The database itself in fact is a service. Everything is going to happen by a service call from the front end to the back end. In those situations, it exactly points out that when you get to the high end, every application, every programming environment is going to give you similar tools for scaling up. The mod_php solution, the mod_perl solution, a Python solution, everything at the front-end is going to end up looking the same, which is to say: calls to back-end services.
If you have one hit per second at your peak times, you're still talking about 86,000 hits a day. Most applications aren't going to reach that.
Aaron: Exactly. Which is to say that in those situations where you're talking about a smaller scale application, we're scaling back and not over-architecting for the small scale. Even if you're only getting one hit per day, and let me really dig a little on the CGI situation, if you're getting one hit per second, you can say, "Oh, my server can handle that because it takes me eight tenths of a second to render a page," spawning the process and so on. But if everything's already in memory, everything's ready to rock and roll and you just need to process that request and get the response out, then you can do it in a quarter of a second. Yes, you have the similar scalability at the one-hit per second, but you have a much better customer experience where you're serving the pages in a quarter of a second instead of almost a whole second.
You can do more with the machine besides that too.
Byrne: That's a very important point. People who are running these applications are not just running Apache. They are also running MySQL [or] PostgreSQL in addition on the same machine. The point is that on any server there are always other applications and services competing for resources. When a system comes under load, the weakest link in the chain can have ripple effects through the entire system where now that relatively slow CGI application that is now not only taking a little longer to load, but also consuming a lot more system memory and system CPU.
All of that has ripple effects through the entire system because now your database takes longer to process each query. Now that one hit per second application can now all of a sudden become a one hit every two seconds because your system's bottlenecks are finally being tested due to a lack of memory or CPU.
I think you've convinced me to put a stake through CGI. How does mod_perlite address this? Does it keep a Perl interpreter persistent in memory in a similar way as mod_perl? Is it more like the fast CGI protocol, or is it something different?
Byrne: The model is actually found in mod_php. In fact, the hypothesis that started Aaron and I on this project was that mod_php had solved a lot of the problems CGI faces today. As an Apache module, it simply marshals a incoming request by mapping that request to a file on the file system, sucking the file into memory, piping it through an interpreter and then returning the result to the browser. What if you just took that C code -- that module -- and swapped out the bindings to a PHP interpreter and put in a Perl interpreter instead?
Technically all of the pieces should be there. The hypothesis was, maybe it really is as simple as moving the hoses connecting to one device and just connecting them to another. Maybe it'll just work. Aaron could probably talk much more expertly about where the rubber meets the road in that respect, but that is where it came from. It was not about "mod_perl is broken" or anything like that. It was simply stemming from that hypothesis that this problem has been solved before. Surely we could reuse a lot of what has already been solved and apply it to Perl.
Aaron: Where the rubber meets the road, or what I ended up doing was starting by reading mod_php's interface to Apache. I then said, "Okay. This is pretty straightforward but deeply interlinked with other parts of the PHP system." I said, "Okay. This is not going to work as an actual cut and paste, but what it can do is serve as a direct inspiration." Then I looked at mod_example.
I looked at mod_cgi and some of these very straightforward examples, especially these very straightforward Apache modules that come with Apache that are there to show you how the module system works. Then I grabbed the Embedding and Extending Perl book and I took a look at the example embedded Perl front-end code and said, "Oh, this is going to be easy." Plug the two together and I have a situation now where it does in fact work. The request comes in; it gets mapped to a file in the file system; the file gets picked up and handed off to the Perl interpreter; the Perl interpreter does its magic and then writes the response back out to the client.
What I found in that process though -- and this is going to get into what are the big architectural hurdles that are going to face Perl -- the Perl interpreter itself has to get reinitialized on every call.
This is something that is both fundamental to the power of Perl but also really points out the philosophical difference between something like a PHP interpreter and a Perl interpreter. The deep philosophical difference here is that the PHP interpreter does not allow access to itself by the user where the user in this case is the person writing the code. The Perl interpreter, on the other hand, trusts that the programmer is truly an engineer and understands his environment, understands what he's doing and hands over the full power of its internals to Perl space for the Perl engineer to use. Even more of its internals are made directly available for C modules, XS modules which you can pull in through Perl, whereas you cannot pull in any C based module into PHP from PHP. There's no way to do it -- and a lot of rather tight controls for what limited ability there is to do that. There's also no safe mode in Perl. Brad Fitzpatrick and Artur Bergman were working on an XS module because they were looking at how to make Perl available in Google's AppEngine. They looked at it and they said, "Oh my God, look at all of these protocols that are directly accessible to manipulate the system environment and you can't turn them off."
You can actually write a very small XS module which does.
Aaron: Which is what they did.
Which enforces your point.
Aaron: Which goes to the point of the philosophy of the Perl interpreter is the programmer knows what they're doing and is entrusted to exercise that power.
The PHP interpreter says, "The programmer's an idiot and we're not going to trust him with anything." Again, for the upload and go case, that's actually what you want. You want to treat everybody like they might be a little hostile. That's the use case that we're looking at. The question was then, "Okay, how do we take the Perl interpreter and have it treat the code that it's about to run as possibly a little hostile, maybe not that friendly, but also something what we definitely want to be executing?"
I'm waiting for the punch line here.
Aaron: I was hoping that would be a good one. The situation is that after every call in mod_perlite, we're reinitializing the interpreter, running it again. It still saves some power because of processing time because it's already in memory -- the library's already loaded -- and it just gives the library a call to go ahead and flush and go.
It's not as dramatic of a speed-up as we were hoping for because there is a lot of work that has to go on in Perl's guts when it's reinitializing the interpreter's memory space, kind of rebuilding it from scratch. Some of the points that we come up to are challenges to Perl to say -- and turn to the Perl community and the P5P group -- are these features that you'd be willing to take into core and take a look at a situation where you want the embedded Perl interpreter to have the ability to tighten it down and to reuse it a little more flexibly?
What are the implications here for modules? Suppose I want to use a database in my code. I want to run that through mod_perlite in my program. Do I need to load the DBI every new time because you flushed the memory in the interpreter for every request?
Aaron: Unfortunately, that is the case. That's another situation where we've not been able to tackle the key problem. Code catching, if it is possible in Perl, it's not something I was able to figure out.
That's because the code and the interpreter environment that the code is running in become intimately entwined once the Perl machine gets up and running. In the PHP world this is not the case. The interpreter can grab already parsed code that's already in an abstract syntax tree or in their own -- I'm not sure if they're using bytecode, but whatever it happens to be -- it's already preparsed, ready to go, and it can run through that code. It does not have to provide the same memory environment that it was already running at, whereas the Perl interpreter does.
Byrne: This is something that we identified early on as something for the roadmap. Even PHP has constructs to preload files upon every request and things of that nature. I remember talking to Aaron early on. It would be really great -- especially because Movable Type uses a lot of different third party modules -- if we could somehow at the web server level or in .htaccess level be able to say "These are the Perl modules that I as a systems administrator essentially bless. After you initialize the Perl interpreter, go ahead and use these modules and pull them into memory space." That way, you would have some flexibility over what modules you trust and will bring in the memory space and replicate for each Apache process so that you don't incur the cost of loading them over and over and over again.
Do you embed the system Perl or do you ship your own source code to Perl and compile it your own way?
Aaron: It's using the system libperl.
I wonder if you could rely on having a libperl built with threading involved. If you could in the main Perl interpreter load all of your modules and things like that and then create a new thread and destroy the interpreter threads for each request, maybe get some memory sharing that way.
Aaron: That's an interesting approach. I'm going to write that down and check it out. Perl is currently default compiled with threading disabled, but I think that is also something where we could issue a challenge to the community and say, "Hey, it would be really great for us if threading were defaults because then we could rely on it in this way."
What would it take? Where do the resources need to be allocated? Where does the noise need to be made? Who are the people who need to be cajoled or asked nicely to complete their work, complete their code reviews or what have you in order to get this going?
Cloning an interpreter into a new thread is still expensive --
interpreters are big -- but if you don't have to reload all of your XS code and
if you don't have to reparse a bunch of code, you can preload things as you
might do with
PerlLoadModule in mod_perl. You could save some
Aaron: That's one place. The other place, I wonder, is what it would take to be able to flush all the parts of the Perl interpreter's memory. Off the cuff, I would say it probably is just not architecturally feasible to do. I'm curious what would it take in order to take a live Perl interpreter and reset it back to zero without actually destroying and recreating it.
In Parrot we explicitly made that easy, because we'd learned from some of the mistakes of Perl 5.
Aaron: That's an exciting thing for Parrot as well. As Byrne noted earlier, calling CGI dead as an inflammatory statement raised some hackles and gets some attention. We leveraged the existence of this module and the effort of it for purposes of the academic exercises or thought exercise of saying "What do we need to fix in the other parts of the ecosystem in order to obsolete this project?" What if this project were totally unnecessary because its existence got everyone else to say, "Oh, we just need to add those features"?
Boom, all of the other parts of the Perl ecosystem take up the space and make these same features accessible to users. That would be a great outcome.
Would you call that success?
Aaron: If that's not the outcome, then we'll have to push forward on this and there's clearly a niche where mod_perlite can really provide a strong offering.
Byrne: In some respects, that's perhaps ideal because that
means you wouldn't even have to worry about adoption problem down the road.
Because even if mod_perlite is successful in achieving its objective, it still
has to solve the problem of hosting provider adoption and system adoption. If
you can reduce the adoption problem down to
yum update or
apt-get, that's the ideal. But if the next step is to evangelize
and go to Media Temple and go to Go Daddy and go to all of these hosting
providers and service providers and convince them that yeah, this is an Apache
module you need to download and install and it will solve a problem which
perhaps they don't even know they have. So be it. But if mod_perlite helps shed
light on the root cause as being somewhere else within the ecosystem, then
yeah, I'd be happy for mod_perlite simply to be an exercise and a lightning rod
to solve those problems.
What does your roadmap look like towards that end? What should we look for in the next few months?
Byrne: One of the things that we need to do is a brain dump and get what's in Aaron's head on to a Wiki or in to some format so that others know kind of how and where they can help. What are the challenges and road blocks facing us and what do we need to bring people's brains to bear on? Next, it's about building a prototype: a set of applications that exercise the various parts of the Perl interpreter and make all of the various system calls that we can use to prove that this works and is stable. Then it's about evangelism and bringing the solution to market and getting people and hosting providers and services providers to adopt it. Or alternatively, to actually talk directly to the Apache foundation. If mod_cgi is bundled with Apache, then why not just mod_perlite? Why not mod_php? Just hypothetically.
It stands to reason that if you're going to bundle and package these modules, why not go the extra mile and help bundle and package some additional modules that are so common in the marketplace that perhaps there's some efficiencies to be gained there? Just go ahead and embrace and bundle it.
Projects like XAMPP include modules like that.
Aaron: That's not a bad idea. Then to follow on from what the last piece of what Byrne said about what would the Apache foundation be open to including in Apache. It'd be pretty interesting to approach them with mod_perlite and say, "Hey, look. This is three C files and uses the system's Perl and you don't need to distribute more than an extra 10K of code." Then you've got this great environment ready to go. I would expect the first ten replies to be no, but ....
Byrne: Honestly, the complexity of shipping something like this could be such that every response will be no. That's where I think your idea of approaching more holistic application distributors like XAMPP, people who are distributing and bundling entire solutions as opposed to just single components because at the end of the day, is a good one. For these distributors the web server is just a component in a system, not the system itself.
Then there are those who are bundling virtual machines, be it Jump Box or Amazon Web Services and EC2, people who are bundling entire systems -- we need to talk to them to make sure these solutions are prebundled and preconfigured in those environments. That might be kind of an end run around trying to get Apache to adopt it -- which would be great, but honestly, I'm a little skeptical that Apache would choose to bundle mod_perlite.
Aaron: Look at it again as an academic exercise perhaps to say, "What would it take? What are the criteria? What are the hurdles that would need to be jumped? What are the changes in the rest of the ecosystem that would need to take place in order for the concept to be feasible?"
Byrne: First and foremost we must get something that works and then ship it. Then we must focus on getting some buy-in and getting something that's been tested and vetted by a larger community because until it works, it really is just a theoretical question of "would you ship this" because no one is going to ship anything that will put a system at risk. First, we have to ship it, make sure it's solid and make sure it works, and then we can really talk about a distribution strategy.
How far are you from the "let's ship something that works" stage?
Aaron: I ran into some hurdles in not quite understanding the Apache request model. Currently the code as currently posted crashes on certain types of requests. That's just a matter of figuring out what the details in the corner cases are.
That's something where I just haven't looked at it in a while and would love to get my hands dirty with it again. Especially with some more people and more eyes to look at it and say, "Hey, it would be great if you did this..." Or, "what if you did that?" Again, to really flush out where are we going, what do we need and what needs to happen in the ecosystem for the project to succeed.
What can people do to help you?
Aaron: Check out the code. It's modperlite.org. Take a look at where the code is. It is extraordinarily simple. Really, really short. A couple hundred lines of code ties it all together. Then see, "Okay. Will this work for me? What do I need? Where does it need to go?" That's where we're at.
Byrne: Aaron and I will post a to-do list.
Byrne: And a state of the onion light, if you will.
State of the leek.
Byrne: Yes. Good. I like that. There is our new logo. We'll make it known what the technical hurdles that are facing us that we could really use some help on solving. That will at least point people in the right direction. But as Aaron points out, one of the great things that Aaron has already been able to prove is that this is not a complex problem. This is not hundreds and hundreds, thousands and thousands of lines of code. This is a relatively simple Apache module that does a relatively straightforward and simple thing. If we're successful which I think we will be, its simplicity will be one of the core reasons why it is successful.
Aaron: I might also say that the Apache foundation, Apache itself could do well to ship a better mod example. As well, the definitive books on Apache modules don't really address what it takes to write an Apache module in C. It was mostly reading others' code in order to understand what needed to be done. I might suggest that that's a great place where O'Reilly could revisit its current book offerings.
Apache could revisit some of its documentation and say, "Hey, what's needed here?" That may be something that I will do in the process of working through it, especially taking a look at what's changed in Apache versions in the module interface -- some subtle changes that some of the documentation says do this and some says do that. Maybe I'm impugning myself on not having enough time to just, say, read the code, but as far as I can tell, Apache modules don't seem to be something that people write everyday, for better or for worse, right? It's the second level out where people are writing code that runs in an Apache module.
Aaron: And that's where the effort is.
Do you target Apache 2 solely?
Aaron: At the moment, yes, just because we developed against the latest version of Apache.
Byrne: Why we're focusing on Apache 2 first which is we need to prove that this works. We need to prove that we're actually solving the problem and not just creating another solution without any real tangible benefit.
If we can prove this is a better way to do Perl on a web server, then it's not going to just be about Apache 2 and Apache 1; it's going to be about lighthttpd and about all of the other different web servers, even, I dare say, IIS. It's going to be about all of the major web servers and writing modules that will allow them to run Perl in this manner.