This was my second Pycon - since the last two have been in Chicago and on a weekend it has been an easy drive for me to attend coming from Michigan. Since last year was my first time at Pycon - I was just finding my way around. This year I was invited to all the "Python educators" parties and running around promoting my upcoming "Using Google App Engine" book - handing out postcards with discount codes and hanging out at the O'Reilly booth.
I was really excited when Guido posted a BOF about Google App Engine. Even though I wrote an introductory book, there were a few nagging questions that the documentation seemed not to answer.
The BOF was attended by Guido van Rossum and Joe Gregario of Google as well as some experienced App Engine developers including Thomas Bohmbach Jr who was the lead developer for Best Buy's giftag.com and was completely famous because he is featured in a video on the App Engine main site - Thomas is also interested in teaching beginners how to use App Engine - Just like me!
So I posed a few questions and got some super answers.
How should parent-child keys should be used? How deep can they go? I wondered how you avoid concurrency problems. I wondered if we needed to make artificially deep trees to create more places to lock. Also I wondered if we were doing something like a user-facing file system, should we have the connections between folders within folders be relationships or be parent-child connections. The answer to all these questions was an emphatic NO!!! - Don't add parent-child relationships willy-nilly. Effectively you add parent-child relationships as part of your design for concurrency / transactions. You make trees so that data within the tree is all locked together. If you want truly independent data which will not cause concurrency - simply make plenty of root nodes.
What is the nature of the set of Model B's when looking at an instance of Model A when you have a relationship from model B to model A. Was the set maintained as data or was it constructed in a lazy manner? Answer: The set is a convenience - it is not stored - it is constructed as you need it. Looking from an A to a set of related B's may not be super-efficieint - but it does not waste space.
When you have something like a chat message which as a related user, should you express the relationship as a relationship or just de-normalize a bit and make the key a string? Answer: No - if it is a relationship model it as a relationship. Do not hide that fact from the data models. You might want to be careful how you traverse the relationship for performance issues - but model it honestly. For example in the case of the chat messages and related users - the trick is to do one query to get the most recent 20 chat messages (one I/O), then extract the user keys from the 20 chat objects and then so a multi-key GET to pull in the references users (one I/O) and then to the little join of (10-20) users and (20) messages in memory for display. This way you are honestly modeling things but making sure that the minimum number of I/Os are being done. Also Thomas B said that if you lied and modelled relationships a string - your data import and export and moving data between servers would break horribly - because the strings would not be known as references. If you modeled honestly import and export just would work. Of course! The lesson - model honestly and code intelligently - sometimes convenience methods are not the best way to do something.
What is the life-cycle of my Python process in App Engine? If I create a global variable and add stuff to it in Handler calls - will it stay around? Is there any reason to worry about concurrency? I had heard this referred to "incoherent caching". The answer was to think of things as a process model. If you have no processes running and a request comes in - you get started in a process. Later requests may come to the same process and such may reuse global data set by earlier requests to the same process. If load goes up App Engine may make more processes - these get their own global data. Requests are passed out to processes. There is no issue about concurrency within a process - this is more a Python thing - the processes are single threaded - so you can safely mess with globally scoped lists or dictionaries during a request to your heart's content. If your load goes down, your processes start going away - and when load comes back up you are started again with fresh global memory. For me this is perfect - I can do slightly costly things like scanning a folder for "plug-ins" and loading those once per process start and keep a list in global memory. If my usage is low - there is a small overhead on each request and if usage is high - I get al the benefits of loading only once for many requests. Very very nice.
All in all it was a great meeting - and I cannot wait for Google I/O 2009 (http://code.google.com/events/io/) where there will be thousands of App Engine geeks to talk with. See you there. The book will be out by then - I hope to be carrying copies of the book around in my backpack! Perhaps there will be space to talk about teaching App Engine to beginning students at Google I/O. Maybe an impromptu BOF in the hallway of the Moscone Center.