In December 2008, the Python developers released Python 3.0, a new version of the popular dynamic programming language. While Python creator Guido van Rossum had discussed ideas for Python 3000 (renamed Python 3.0) for several years, the community began to work on it in earnest in 2005. Guido generously agreed to discuss the development process and what this new release means for the future of the language.
First of all, congratulations on Python 3.0.
Guido van Rossum: Thanks a lot.
I know it's been a while, but was the process successful from your point-of-view?
Guido van Rossum: It doesn't even matter so much whether it's successful from my point of view as long as it's successful from the community's point of view. It's interesting. I really started the Python 3000 idea way back in 2000. I started the actual effort to do something about it -- to actually design it and agree on everything -- about three years ago, just around the time I started here at Google when I got more time for Python. For a long time I've been really pushing the project along, getting a lot of people excited. I gave yearly talks at PyCon, at Euro Python, other conferences. Over the last 12 months, I've been not entirely but for a large part just been sitting back and enjoying the ride while the developer community was finishing up all of the final details and getting ready for the release.
Did you take a smaller role in development this time?
Guido van Rossum: I did it until a year ago and then for various reasons probably having to do also with moving to a different team at Google, I played a less active role. I didn't spend two or three days each week coding or designing or reviewing. I kept track of the email stream, but I reduced my effort and let the people in the core Python development group who were doing most of the work anyway deal with it themselves. They did a very good job as far as I can tell.
What's the degree of overlap between the development groups for Python 2.6 and Python 3?
Guido van Rossum: Oh, it's pretty much the same team. That's why we could pull off more or less synchronized releases. When we started the 3.0 idea for real three years ago, we didn't really have much of an idea how the development was going to be done because the first year we mostly wrote a lot of PEPs and discussed a lot of potential changes.
Once we started working gradually, and it may be that that happened around the time that Barry [Warsaw] committed to be the release manager, we decided that it was a very good idea to synchronize the releases of 2.6 and 3.0 because two releases are really connected in various ways.
The most important connection is probably that 2.6 has backports of a lot of 3.0 features. In cases where backports are unrealistic, it has warnings about things that still work in 2.6 because they have to be compatible with previous 2.X versions but will stop working in 3.0.
Whose idea was that? It sounds like an obvious idea in retrospect, but so many good ideas are.
Guido van Rossum: I honestly don't remember. That's an idea we've had for a long time. It's possible that it happened at a Python conference about two years ago, a little less than two years which I think was the first time that people realized that the Python 3000 idea was serious and that there was going to be a new release and that it was going to be incompatible and that it was going to affect many existing third party Python projects.
I remember being sort of ambushed by a group of Twisted developers and a few PSF board members who said, "This is all great." They had just listened to my keynote where I was sketching in fairly broad lines still what Python 3.0 would change and what it wouldn't change. They were telling me, "That's all fine and we love that the language is evolving in that way, but you also need to think about how projects can deal with the transition."
The transition is not going to be so easy as previous transitions where
there's always a backwards compatibility clause in the new release. It's very
possible that during a bunch of brainstorm sessions like that we came up with
the various parts of the transitional strategy. I don't know if you've read up
on this, but there is the
2to3 source-to-source translation tool which
can only do so much because it's such a dynamic language and it really doesn't
have enough smarts to figure out what the type of a particular variable is.
On the other hand, for the places where that tool cannot help you or cannot
help you enough, we decided to add warnings to Python 2.6. Those warnings are
actually off by default, but you have to invoke it with a special command line
-3. Then it will spit out all of those warnings. Even
though I think around that time when the third party developers got really sort
of seriously worried about how they would have to deal with the transition, we
figured out the details of sort of how that would work -- at least, of the
specific design of a command line flag that's off by default. When you turn it
on, you get lots of warnings. The implementation of specific warnings was
done later in an ad hoc fashion.
There really was an earlier precedent for this, way back in the days. I think it was Python 2.2 or 2.1 even when I personally realized that there was a problem with the integer division. I added command line flags that would help you figure out at run time which division operators were acting on integers and which ones were acting on the other types of values for division. I think those flags are still present in 2.6 because the default situation has never changed until 3.0.
I think it was a
-Q flag. You could turn on the new integer
division basically giving it the same semantics as Python 3.0 has with a
command line flag which was probably for most people would not be a good idea,
but for people who are in an educational environment where they're just firing
up the Python interpreter to do some simple exploration of programming, that's
probably the right option. That's why that option existed at all. Then there
was an other option -- another variant of that option really -- that just warns
you about which line of code had an integer division that was not expressed
using the double slash operator.
I wrote a separate tool at the time that took all of those warnings and took your source code and actually, it wouldn't modify your source code in place, but it would actually spit out a unified diff-style file that you could use to patch your source code.
I had never thought about that before, but that's a brilliant idea.
Guido van Rossum: Actually, the two to three tool has a mode like that too where instead of modifying the code in place, what it normally does, you can also ask it to just print out unified diffs of what's going to change. If you want to, you can just review it on a file-by-file basis and decide, "Oh, yeah. It's doing the right thing." Or if you review it carefully, you might find that there's actually a case where the tool is not doing the right thing.
You can then take that patch file and manually redo the patch that you think should be applied. That way the tool can still help you with the bulk of the work. That was a precedent for what's happening in the Python 2.6 to 3.0 world.
Do I understand correctly that you think there will be a Python 3.1 release in the next year?
Guido van Rossum: Probably. People are eager to add new features that didn't make it on time for 3.0. However, I've heard some people who said, "Well, 3.0 is incompatible so let's make 3.1 incompatible as well."
No way. No way. Going forward, 3.1 is going to be just as backwards compatible with 3.0 as 2.6 is with 2.5 or 2.5 with 2.4.
It sounds like then your policy is that within the major release family you maintain compatibility.
Guido van Rossum: Exactly.
I know you don't have a crystal ball, but at some point in the future might there be a Python 4? Or do you think 3 is your vision of what Python should be for the next foreseeable future?
Guido van Rossum: That depends on how far you take foreseeable future.
Let's say 10 years.
Guido van Rossum: It might well stay at 3 for 10 years. I expect it would stay at 3 at least for five years. That's the horizon where I have some confidence that I can predict what might happen or might have some control over it still.
Five years is a good long period of time though.
Guido van Rossum: Python has exhibited a significant but very gradual growth. Actually this or next week would technically be its 19th birthday. It's going on 20. I think that five years is not an unreasonable period of time to look ahead.
It seems like in the past year or two, there has been a real uptake of Python. I notice it a lot in the web programming space: Google's App Engine for example, but also projects such as Django. Do you credit those with some of the growth of Python, or is that part of a whole?
Guido van Rossum: I think that dynamic languages have finally gained some respect. It's something that has been going on for a long time. I remember 15 years ago being approached by people who said, "I want to introduce Python, but my boss won't let me use an unknown language." Five years ago, I would still get the same type of mail.
We don't see that so much anymore. I would actually credit all of the above, but I would also credit other dynamic languages, in particular, Ruby, possibly even efforts in the Java world like Groovy. Dynamic languages, people understand better what they are. There aren't so many people who use the wrong terminology like what was it -- for a while people would use terms like weak typing vs. strong typing. It gave dynamic languages a bad name somehow and Ruby, even though it's still the new kid on the block, by creating a lot of PR, caused a lot of people to think about the idea of dynamic languages and realized that there were others around, that Ruby wasn't the only dynamic language, that there was Perl, that there was Python. Both of which influenced Ruby. Both of which still are around.
The whole idea of dynamic languages is a viable tool for many application areas. Perhaps also combined with sort of the ever increasing sort of popularity of web applications where these languages are popular for -- I'm not entirely sure why dynamic languages are so popular for web development, but it probably has to do with being able to very rapidly integrate over your application.
One of the ideas I've heard people come up with was that in '93, '94 with the CGI explosion, Perl was just so great at string handling it didn't make sense to use anything else. I mean certainly not C. And certainly if you wanted to use the Bash shell, Korn shell, a C shell, you just had so much more work to do.
Guido van Rossum: The interesting thing is that for a long time CGI was the thing when doing dynamic websites. There was a time that dynamic parts of a website were special. I remember one of my failures at looking ahead was that when CGI first came around, I didn't think it was particularly interesting or important. When I first saw HTML and HTTP, I immediately started writing web clients and I wrote a complete web browser or maybe even two in Python. I wrote a lot of client libraries for dealing with the web. I also wrote web server code, but I didn't really embrace the idea of dynamic websites as they currently exist as all that exciting or different.
The CGI module has for a very long time been this strange orphan in the Python standard library that nobody touched. Now, of course, CGI is over.Everybody uses different approaches. I'm really glad, for example, that the WSGI standard came up in the Python world and that everyone who was doing anything with dynamic websites in Python is using that to couple the serving infrastructure with application infrastructure.
It's one of those features in Python I wonder why other languages haven't adopted. It makes so much sense.
Guido van Rossum: I think the Java world has a pretty standard approach as well.
For languages that don't have a default approach, why not just adopt WSGI wholeheartedly?
Guido van Rossum: I'd be all for that. There are probably many details in WSGI that are Python specific, but I imagine that the idea at large can easily be duplicated in Perl and Ruby. Ruby probably didn't feel the need for something like WSGI so much because there's essentially one Ruby-based web framework.
Well Python, being a much older language, has a number of longstanding web frameworks with different properties. I think WSGI was created out of a desire specifically to decouple the choice of application framework from the choice of a web serving framework. For example, you had Zope in Python world for I think at least ten years. Other things were coming up, Turbo Gears, Pylons, Django.... All of those things, when they first were conceived and created by their creators didn't have a WSGI approach and they all had different ad hoc solutions. They usually had two things: They had a development server which was just written in pure Python, often single threaded and really meant for debugging applications and nothing else. Then there were various rather painful ways of hooking it up to, for example, Apache. Everybody had to invent the details of how to do that themselves.
Is there something special in the Python world that leads to a desire for unification? That's the sort of thing other languages and projects possibly like to reproduce if possible. If it's something special to Python, good for Python.
Guido van Rossum: I think that the occasional unification effort in the Python world actually comes from the fact that often at least the users have the perception that they don't get enough guidance or the choice of components of frameworks. In certain areas it's very easy to come up with your own framework, and frameworks often are opinionated so that different frameworks compete. The people who are wedded to a particular framework or who are heart of the core developer and user groups for a framework love their own framework. Someone coming from the outside who says, "I want to use Python and I want to do web development," now have a problem because depending on who their friends are, they'll get very different recommendations for what framework to use.
If they don't have anyone who is particularly opinionated or influential, they try to do their own research. You Google for Python web framework, and realize that there are 50 different choices. I forget when it was, but it was probably three or four years ago someone spent a lot of time in trying to research the different Python web frameworks and look at them in more detail to see how similar they were and if there was a clear winner, by the web off. I don't know if that web off had a direct effect, but it seems to me that that was around the time that Django and Turbo Gears started to emerge.
A lot of other offerings that were very popular at that time or that were at least in the running at the time haven't been heard of much. Some offerings probably merged into Turbo Gears which is a project that tries to find good components and then incorporate them rather than create its own set of components from scratch. Django, on the other hand, creates all of its own components just like Zope used to do.
Zope has evolved into a content management system which is a bigger beast than what the typical web 2.0 developer needs. In the last year, we've seen another one so this competition still is not over. But I think it's good that there's been a bit of a shakeout. If people want to do web development with Python nowadays, they don't complain that there are too many options and they don't know which one to choose because there is only a small number of options that sort of are on everybody's radar.
There's not an embarrassment of riches; there's a sufficiency of wealth.
Guido van Rossum: Exactly. Yeah. There are maybe two or three big options and then there are a couple of more specialized choices. We've gone through a thing like that before. In the late 90s we had a similar issue with GUI frameworks. In the end, I don't think there is any GUI toolkit that really has taken over at the expense of all of the others in the Python world which possibly just is because people are less interested in GUI development and more interested in web development nowadays.
The last serious business project I worked on in Python was about 2002. We choose WxPython for that.
Guido van Rossum: Which is still around and still has a very decent reputation.
We were very pleased. With that and py2exe, we were able to create a nice client side GUI application. I was very pleased with that. Just said you think there's more work going on in the web than in GUI or client side programming with Python. Is that inevitable?
Guido van Rossum: My own practical experience is that in recent times, I have not had either the need or the desire to do any GUI development at all. Probably not at all in the last three years. I've done plenty of web development. In general there is a lot of focus change where people don't develop as many desktop apps anymore.
On the desktop you have browsers. You have a couple of other specialized apps like on the Mac there's iTunes and iPhoto and I suppose Windows has similar things. Maybe there's even something for Linux. But there's not a whole lot. If you have a small project and you want some kind of user interface, it's much more likely that your project is already equipped with a web server or is distributed in a way that the only thing that makes sense is actually a web server that can act as a status dashboard or a data entry tool or whatever. There is less and less need for GUI apps. App Engine, of course, is totally a web project. Maybe that's not fair to look at it. Google is mostly a web company, but the only GUI work done in the context of the App Engine is actually the launcher. We have one guy basically who has created that launcher, and it's a fairly simple user interface. That is one of the things that still makes sense to have as a desktop app because the whole purpose to be developing in an environment where you dont necessarily have web access and where your application isn't deployed to your web server yet.
Guido van Rossum: The implied answer is that it's not really on my personal radar. There are probably people in other parts of the Python community who are actively thinking about that or working on it. I think that Python is a general enough language that it doesn't have to come from the Python community. A different community that is interested in a certain type of apps may just decide to adopt Python, just like they may decide to adopt some other language -- just like C was perhaps originally designed for a specific kind of application in a particular environment, but it grew way beyond that.
That is very much the case with Python.
That's a sign of a successful language.
Guido van Rossum: That makes it a general purpose programming language.
When you start surprising its creator and original developers by what you're doing with it, that's a good sign.
Guido van Rossum: Definitely.
I've seen some discussion on the Python 3 lists about Unicode handling and file systems. Is that something that needs a fix soonish or is that just people's expectations need some tweaking?
Guido van Rossum: That's hard to say. It is an issue at the moment because not everybody in the world has totally adopted the Unicode religion yet. For once, I hope that that particular religion will be adopted widely. The interesting thing is that if you look at the major operating systems, both Windows and Mac OS X, they actually have committed to Unicode in different ways. In the Windows file system everything is UTF-16. On the Mac file system, everything is UTF-8. There are questions about normalization because Unicode is ambiguous on the different ways of writing characters with accents in certain languages, but by and large, it's all Unicode. If users type file names that happen to contain characters in their national language, which they're likely to do if they're just naming their photos or their text documents or whatever, those things will automatically be encoded in the system-wide standard file system and coding. Python will not have any problem reading those things.
The last bastion of resistance to that way of thinking is actually the rest of the Unix world, where traditionally the file system does not think about encodings; it's just bytes. People in Russia use an eight-bit encoding that includes Cyrillic as well as the Latin alphabet. I imagine people in Turkey do the same thing, but it contains the Turkish alphabet which only has two or three special characters.
They have funny capitalization rules.
Guido van Rossum: Yeah. People in Japan and in China can choose to use Unicode, but they also can choose to use other older and still widely used encodings for a file name. Because Linux has a lot of different vendors, it seems that there's no one there who really sort of can make a decision and put their foot down and draw a line in the sand and say, "We're going to use UTF-8 for the file system encoding."
I expect that maybe five years from now that will have happened somehow, either by one forward-thinking vendor making that choice or just setting the system defaults differently. It could be Ubuntu. They have a lot of international users and developers.
I expect that in five or ten years, it will be no problem even on Linux systems. In the meantime, there are different philosophies. I'm not sure if the minority is just very loud or if the minority is actually not so small. There are definitely people who say, "I want to be able to use my non-UTF-8 encoding on my file system and I want to be able to sort of cooperate with other users on the same computer who use different encoding for their file names." Well, yes, it's going to be difficult for a Python program to deal with that. On the one hand, it is possible to write a Python 3.0 program that does deal with it. All the file system APIs have a version that takes bytes and returns bytes.
Exactly what that means on Windows is unclear because UTF-16 as bytes is really unpleasant uncomfortable encoding. On the other hand, UTF-8 is completely not what the system uses natively. So on Windows, I would recommend never to use the bytes API. On the other hand, on the Mac, the bytes API is fine because you always get UTF-8 except in rare cases where you've mounted an external file system that was created on a different system of course.
My expectation is that most users will write code that either only runs in situations where UTF-8 is the standard or at least in situations where the encoding is picked once and everybody sticks to that same encoding.
Do you see the fix as more on the user side than Python side?
Guido van Rossum: I expect, and I'm clearly also hoping, but I'm also really expecting that users will choose wisely and somehow their systems will hopefully be configured so that every user uses the same encoding. Users shouldn't have to understand encodings and they shouldn't have to deal with encodings. It's much better that the system has a good approach to dealing with that like Windows and Mac OS X do.
I think the users who want total control and actually live in an environment where not all file names are encoded using the same encoding are a very small minority. It's likely that when they download a Python application that does extensive file system inspection or manipulation written by someone in a more Unicode friendly environment that the application will occasionally not quite work right. Then it's a quality of the implementation issue. They can file a bug with the author of the code and maybe there's a way to fix the program so that it can deal with these different things. Or maybe the users will choose to start using the same encoding for all of their file names.
I expect that there's a small category of applications where it's at least for the time being important to deal with file names in different encodings. If you're writing an application whose primary purpose is file system inspection, if you're like writing a Python version of Midnight Commander or something like that, you're going to have to deal with it somehow.
It's difficult to sniff a file system to find out what kind of encoding it has.
Guido van Rossum: The premise here is that it's not the same everywhere on the file system. Of course, it means that if your friend creates files in a different encoding and you don't know what that encoding is, you're going to have a hard time actually reading their file names or typing them. If you're lucky, you have a tool that displays them and maybe there are some recognizable characters and there will be some squares or question marks representing unknown stuff, and you'll be able to pick one of those from a list without actually knowing what the text is that was in the file name, and you'll be able to play the music in that file or open it as a Word document or whatever.
Hopefully, the text in the document is encoded in a way that you can at least tell what the encoding is, but it's still going to be inconvenient. If someone creates a file with Cyrillic characters in it and they ask me to go to their directory and see that file, I'll see the Cyrillic characters, but I still don't know what it means. For me, it's not much of an advantage to whether I see the Cyrillic characters or whether I just see question marks. It actually depends on which tool I'm using. If I view that file system in Emacs, it'll look one way. If I view it in a shell, it'll look a different way. If I view it in some other GUI tool, it'll look a different way again.
I hope that eventually more people will actually understand Unicode. There are a lot of people who just don't understand the difference between characters and encoded bytes. I expect that one way or another that people will become a bit more familiar with that or the details will no longer matter because the software will do the right thing.
Giving them better tools helps a lot.
Guido van Rossum: That was the basic idea behind changing the way Python deals with Unicode and text in general.
Looking at the release plan, the release structure of Python 3, I was impressed at how well that schedule worked out. How did that go from a project management perspective? Did you have much oversight of that?
Guido van Rossum: I tried to stay on top of the schedule, yeah. I think the first time I was thinking about a release date at all for 3.0 was early 2008. I was giving a talk in Bejing. There I said, "I expect that next year at the Olympics, so that would be August 2008, we'll have 3.0 released." For a long time August was actually the planned release date both for 2.6 and 3.0. As we got closer, we realized that time-based releases are great, but you still have to have some level of feature completeness. We realized that we weren't quite ready. While on the one hand, we didn't want to postpone it forever, we also didn't want to rush it before it was ready.
We started doing triage of which planned features that still hadn't been implemented could actually be implemented in the time remaining and which features were better to just give up on for now and say we'll have another chance at 3.1 or 2.7. Then as we got closer, 2.6 actually turned out to be more mature than 3.0, even though the 3.0 development lead time had been longer, which makes sense because 3.0 is just a bigger step forward.
In the end, we realized that there was no point in holding up 2.6 any longer, but it was also unwise to release 3.0. Fairly shortly before 2.6 was finally released, we let go of the idea of releasing them completely in sync as we had been doing during the beta. When 2.6 final came out, on the same day or in the same week, we did a 3.0 release candidate. Then there were some discussions. Some people didn't think that the release candidate really deserved that name and wanted to go back to beta. We decided to just keep calling them release candidates, but still accept somewhat larger changes in the successive release candidates than we would have been comfortable with for other releases. At that point, we had picked early December as the final release date for 3.0. People worked hard and did a good job of staying focused on stability and not adding more features at the last moment, so we actually made that schedule.
Are you thinking about sticking to time-based releases? One new major release a year?
Guido van Rossum: I don't want to force it completely. I like to have fairly specific dates that everyone in the developer community is aware of so people can start sort of thinking about, "Well, if I want my feature in that next release, I have to have it in some decent shape and actually committed so many months before that."
At the same time, I don't want to just cut the release on a random date. That's very much the philosophy of all of the release managers we've had throughout the years. A release has got to be a local high point of stability and robustness, completeness.
Occassionaly, we'll decide to do a feature freeze and hold on to the feature freeze until we can get the release candidates in shape. So far with the group of people we have, that's always worked well. I think a lot of people who are in the periphery of the core development group are learning from this. We see people not pay attention and then right before the release candidate insist that a very thorny bug be addressed before a release is made. If the problem is bad enough -- if the problem is actually a real problem -- we're sort of forced to actually address the issue. In many cases, things linger in the bug tracker usually because they're really hard to reproduce or they only affect very odd combinations of application features or API usage or environmental factors.
We often sort of use the argument, "Well, it's been broken in three releases in a row; there's no need to hold up the next release just so that we can fix it. Often these are the things that the fix is not a change one line of code; the fix is refactor an API and sometimes you run into situations where the only way to fix something properly is actually change an API in an incompatible way so you break all existing code that's using that API, which is a typical kind of thing we might do in 3.0 but not normally.
That's not something you want to do a week before a planned release anyway.
Guido van Rossum: It's certainly not something you want to rush.
I'm a big fan of time-based releases. Once you've done them for a few times people get used to the schedule and realize that they've just had a release right now, it's open season for the next two weeks. Then "Oh, it's two weeks or a month before the release; they're not going to accept anything other than a documentation patch or this list of bugs to fix right now."
Guido van Rossum: I think it's been educational for those people who weren't already on board with that religion.
Is there anything you'd like to tell our readers that we haven't covered?
Guido van Rossum: I'd like to reiterate that at this point, it's a very personal choice to decide whether to use 3.0 or 2.6. You don't run the risk of being left behind by taking a conservative stance at this point. 2.6 will be just as well supported by the same group of core Python developers as 3.0. At the same time, we're also not sort of deemphasizing the importance and quality of 3.0. So if you are not held back by external requirements like dependencies on packages or third party software that hasn't been ported to 3.0 yet or working in an environment where everyone else is using another version. If you're learning Python for the first time, 3.0 is a great way to learn the language. There's a couple of things that trip over beginners have been removed.
It's easier to learn the differences between 2.6 and 3.0 after you've learned 3.0 than to go the other way. If you learned Python 2.6, you'd probably use a book that had 2.5 on the cover and it was written for 2.2 or 2.3 and sort of somewhat updated by the author. A lot of those textbooks actually still use idioms that already were deprecated in the 2.3 or 2.4 timeframe. It's quite possible that if you're using 2.6 that you're actually writing a dialect of the language that is mostly compatible with 2.3 or something that old which is I think about five years old by now.
On the other hand, if you learn 3.0, in order to be able to work with 2.6,
you only have to unlearn a few things because many 3.0 features have actually
been backporte to 2.6, or were already available. There's a handful of things
that are essentially different like
Even that you can actually import the
future in 2.6. Really people should look at their needs and
decide whether to use 2.6 or 3.0 and not worry either about 3.0 being
unsupported or 2.6 being unsupported because they'll both get the support you
need. I suspect that in five years 2.6 or that the 2.X line will be much less
important. But for the next three or four years, it's really slowly migrating
from 2.X to 3.0.
Slowly migrating to the future.
Guido van Rossum: Yeah.