Monday, November 28, 2011

Web2py Caching and Google App Engine

With the recent billing changes to Google App Engine, I was presented with a practical and monetary incentive to optimize my web2py app running on GAE.

At the suggestion of Jason Brower, I decided to look into implementing caching.  As a result, I found that this is one amazingly simple way of reducing costs and improving performance of my web2py app.

While there may be other ways of going about this that appear to be more involved, implementing caching can be as simple as adding the following decorator at the start of a function within your controller:

@cache(request.env.path_info, time_expire=3, cache_model=cache.ram)

You can read more about putting this into practice in Chapter 4 of the web2py book.

Here is a visual representation of the results:

Here, the blue line represents dynamic pages being served and the orange line represents pages served out of cache.

Enabling caching resulted in faster page loads for users, reduced costs, less stress on Google App Engine and a dramatic drop in errors messages (bringing them to effectually zero over a 12 hour period).

Where you are really going to gain performance with this are the functions which contain costly database queries.  That said, there are also some issues that you will need to look out for.

After enabling the caching decorator, you will want to test all your views using GAE's dev_appserver.  To catch errors and obtain a traceback, I found it useful to start GAE's with the -d command line argument to enable debugging.

While testing the app within web2py's built-in web server natively produced no errors, when testing the application using GAE's dev appserver, I found a few instances where certain views returned an error.  In particular, I found myself dealing occassionally with a PicklingError.  Fortunately, the great folks in the web2py group (anthony, howesc, massimo and Bruno Rocha) really came through, as they often do, with some great advice on resolving this issue.

In particular, rather than returning the dictionary (using return dict() ) like we usually do, using response.render(dict()) helped in many instances:

return response.render(dict(yourobjectname=yourdictionary)) 

In two other instances, I found I had to use python to get around an orderby within a select:

pledges=db((db.pledge.segment==segment_id) & ( != True)).select(orderby=~db.pledge.created_on)

When I removed orderby=~db.pledge.created_on from the query, the error went away,

however, we really don't want to loose the ability to order our query and serve the most recent results first.  So, to achieve the same results, we now needed to enlist python:

    # pledges=db((db.pledge.segment==segment_id) & ( != True)).select(orderby=~db.pledge.created_on)
    # orderby=~db.pledge.created_on seems to cause an error in GAE when cache is turned on, therefore:
    pledges=db((db.pledge.segment==segment_id) & ( != True)).select()
    sorted_pledges = []
    for row in pledges.sort(lambda row: row.created_on,reverse=True):
    return response.render(dict(pledges=sorted_pledges))

Viola!  We now have arrived at the place we need to be, plus we have caching.

After testing this in GAE's dev_appserver checking for any errors (and presumably rebuilding any indexes that need to be rebuilt) you are now ready to simply deploy your web2py app on GAE with caching.


That said, I still have a question about what the difference is exactly between return dict() and and return response.render(dict()), but I'm really glad about the results.