mod_wsgi on os x

sunday, march 1st, 2009 5:20am

[This installation was so easy it may not seem worth the notes and output below. But over the years I've found it surprisingly useful to be able to refer back to notes such as this, and helpful to read others' detailed installation reports.]


On this page...


Goal & motivation

  • I'll be developing more and more web-apps that call services, and unless I call them all via ajax -- which might actually be a good idea -- I'll bump into the development server's single-threaded limitation. And even if I do call the services via ajax I'll likely run into problems if I'm calling more than one service simultaneously -- although the ability of an ajax call to fail gracefully via a timeout and try its resource again would be a cool thing to learn how to implement.

  • Thinking about this off and on for the last few months, I recently came across a posting by the god of apache-python integration, Graham Dumpleton, about using mod_wsgi for development. This, on top of a slow solaris installation using mod_python (the slowness probably has more to do with the box than mod_python) give me the impetus to delve into this.

  • Finally, gsf's encouragement to check out mod_wsgi at code4lib 2009 made me bite the bullet.

Sources of info

Starting setup

  • python version 2.5.1, via:

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13) 
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
    
  • apache version 2.2.9, via phpinfo():

    apache2handler
    
    Apache Version  Apache/2.2.9 (Unix) mod_ssl/2.2.9 OpenSSL/0.9.7l DAV/2 PHP/5.2.6
    Apache API Version  20051115
    Server Administrator  you@example.com
    Hostname:Port ::1:0
    User/Group  www(70)/70
    Max Requests  Per Child: 0 - Keep Alive: on - Max Per Connection: 100
    Timeouts  Connection: 300 - Keep-Alive: 5
    Virtual Server  No
    Server Root /usr
    Loaded Modules  core prefork http_core mod_so mod_authn_file mod_authn_dbm mod_authn_anon mod_authn_dbd mod_authn_default mod_authz_host mod_authz_groupfile mod_authz_user mod_authz_dbm mod_authz_owner mod_authz_default mod_auth_basic mod_auth_digest mod_cache mod_disk_cache mod_mem_cache mod_dbd mod_dumpio mod_ext_filter mod_include mod_filter mod_deflate mod_log_config mod_log_forensic mod_logio mod_env mod_mime_magic mod_cern_meta mod_expires mod_headers mod_ident mod_usertrack mod_setenvif mod_version mod_proxy mod_proxy_connect mod_proxy_ftp mod_proxy_http mod_proxy_ajp mod_proxy_balancer mod_ssl mod_mime mod_dav mod_status mod_autoindex mod_asis mod_info mod_cgi mod_dav_fs mod_vhost_alias mod_negotiation mod_dir mod_imagemap mod_actions mod_speling mod_userdir mod_alias mod_rewrite mod_bonjour2 mod_php5
    
  • Requirement: Docs say "The GNU C compiler from the MacOS X Developer Toolkit bundle is required." -- I should be fine; developer stuff is installed.

Plan

Install

  • Got source code from the specified download page.

  • Selected single version listed (2.3)

  • Unstuff

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ mv /Users/birkin/Downloads/mod_wsgi-2.3.tar.gz /Developer_3rd/mod_wsgi-2.3.tar.gz
    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ cd /Developer_3rd/
    birkinbox:Developer_3rd birkin$ 
    birkinbox:Developer_3rd birkin$ /usr/bin/tar xvfz ./mod_wsgi-2.3.tar.gz
    mod_wsgi-2.3/
    mod_wsgi-2.3/configure
    mod_wsgi-2.3/configure.ac
    mod_wsgi-2.3/LICENCE
    mod_wsgi-2.3/Makefile-1.X.in
    mod_wsgi-2.3/Makefile-2.X.in
    mod_wsgi-2.3/mod_wsgi.c
    mod_wsgi-2.3/README
    birkinbox:Developer_3rd birkin$
    
  • Configure

    birkinbox:Developer_3rd birkin$ 
    birkinbox:Developer_3rd birkin$ cd ./mod_wsgi-2.3/
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ ls -alF
    total 952
    drwxr-xr-x@  9 birkin  admin     306 Aug 23  2008 ./
    drwxr-xr-x  20 birkin  admin     680 Mar  2 08:16 ../
    -rw-r--r--@  1 birkin  admin   11358 Jun 23  2007 LICENCE
    -rw-r--r--@  1 birkin  admin    1195 Dec 13  2007 Makefile-1.X.in
    -rw-r--r--@  1 birkin  admin    1247 Dec 13  2007 Makefile-2.X.in
    -rw-r--r--@  1 birkin  admin   16440 Mar 13  2008 README
    -rwxr-xr-x@  1 birkin  admin   78314 Dec 21  2007 configure*
    -rw-r--r--@  1 birkin  admin    4151 Jan 24  2008 configure.ac
    -rw-r--r--@  1 birkin  admin  352904 Aug 23  2008 mod_wsgi.c
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ ./configure 
    checking for apxs2... no
    checking for apxs... /usr/sbin/apxs
    checking Apache version... 2.2.9
    checking for python... /usr/bin/python
    configure: creating ./config.status
    config.status: creating Makefile
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ ls -alF
    total 1032
    drwxr-xr-x@ 13 birkin  admin     442 Mar  2 10:38 ./
    drwxr-xr-x  20 birkin  admin     680 Mar  2 08:16 ../
    -rw-r--r--@  1 birkin  admin   11358 Jun 23  2007 LICENCE
    -rw-r--r--   1 birkin  admin    1559 Mar  2 10:38 Makefile
    -rw-r--r--@  1 birkin  admin    1195 Dec 13  2007 Makefile-1.X.in
    -rw-r--r--@  1 birkin  admin    1247 Dec 13  2007 Makefile-2.X.in
    lrwxr-xr-x   1 birkin  admin      15 Mar  2 10:38 Makefile.in@ -> Makefile-2.X.in
    -rw-r--r--@  1 birkin  admin   16440 Mar 13  2008 README
    -rw-r--r--   1 birkin  admin    4474 Mar  2 10:38 config.log
    -rwxr-xr-x   1 birkin  admin   20621 Mar  2 10:38 config.status*
    -rwxr-xr-x@  1 birkin  admin   78314 Dec 21  2007 configure*
    -rw-r--r--@  1 birkin  admin    4151 Jan 24  2008 configure.ac
    -rw-r--r--@  1 birkin  admin  352904 Aug 23  2008 mod_wsgi.c
    birkinbox:mod_wsgi-2.3 birkin$
    

    So far so good.

  • Build

    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ make
    /usr/sbin/apxs -c -I/System/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -DNDEBUG -DMACOSX -DENABLE_DTRACE  -Wc,'-arch ppc7400' -Wc,'-arch ppc64' -Wc,'-arch i386' -Wc,'-arch x86_64' mod_wsgi.c -arch ppc7400 -arch ppc64 -arch i386 -arch x86_64 -Wl,-F/System/Library/Frameworks -framework Python -u _PyMac_Error -framework Python -ldl
    /usr/share/apr-1/build-1/libtool --silent --mode=compile gcc    -DDARWIN -DSIGPROCMASK_SETS_THREAD_MASK -no-cpp-precomp  -I/usr/include/apache2  -I/usr/include/apr-1   -I/usr/include/apr-1  -arch ppc7400 -arch ppc64 -arch i386 -arch x86_64 -I/System/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -DNDEBUG -DMACOSX -DENABLE_DTRACE  -c -o mod_wsgi.lo mod_wsgi.c && touch mod_wsgi.slo
    /usr/share/apr-1/build-1/libtool --silent --mode=link gcc -o mod_wsgi.la  -rpath /usr/libexec/apache2 -module -avoid-version    mod_wsgi.lo -arch ppc7400 -arch ppc64 -arch i386 -arch x86_64 -Wl,-F/System/Library/Frameworks -framework Python -u _PyMac_Error -framework Python -ldl
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ ls -alF
    total 2648
    drwxr-xr-x@ 18 birkin  admin     612 Mar  2 11:26 ./
    drwxr-xr-x  20 birkin  admin     680 Mar  2 08:16 ../
    drwxr-xr-x   7 birkin  admin     238 Mar  2 11:26 .libs/
    -rw-r--r--@  1 birkin  admin   11358 Jun 23  2007 LICENCE
    -rw-r--r--   1 birkin  admin    1559 Mar  2 11:26 Makefile
    -rw-r--r--@  1 birkin  admin    1195 Dec 13  2007 Makefile-1.X.in
    -rw-r--r--@  1 birkin  admin    1247 Dec 13  2007 Makefile-2.X.in
    lrwxr-xr-x   1 birkin  admin      15 Mar  2 11:26 Makefile.in@ -> Makefile-2.X.in
    -rw-r--r--@  1 birkin  admin   16440 Mar 13  2008 README
    -rw-r--r--   1 birkin  admin    4474 Mar  2 11:26 config.log
    -rwxr-xr-x   1 birkin  admin   20621 Mar  2 11:26 config.status*
    -rwxr-xr-x@  1 birkin  admin   78314 Dec 21  2007 configure*
    -rw-r--r--@  1 birkin  admin    4151 Jan 24  2008 configure.ac
    -rw-r--r--@  1 birkin  admin  352904 Aug 23  2008 mod_wsgi.c
    -rw-r--r--   1 birkin  admin     800 Mar  2 11:26 mod_wsgi.la
    -rw-r--r--   1 birkin  admin     315 Mar  2 11:26 mod_wsgi.lo
    -rw-r--r--   1 birkin  admin  817180 Mar  2 11:26 mod_wsgi.o
    -rw-r--r--   1 birkin  admin       0 Mar  2 11:26 mod_wsgi.slo
    birkinbox:mod_wsgi-2.3 birkin$
    

    Smooth! Confirm that mod_wsgi.so file exists (the point of the build):

    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ ls -alF ./.libs/
    total 4432
    drwxr-xr-x   7 birkin  admin     238 Mar  2 11:26 ./
    drwxr-xr-x@ 18 birkin  admin     612 Mar  2 11:26 ../
    -rw-r--r--   1 birkin  admin  803768 Mar  2 11:26 mod_wsgi.a
    lrwxr-xr-x   1 birkin  admin      14 Mar  2 11:26 mod_wsgi.la@ -> ../mod_wsgi.la
    -rw-r--r--   1 birkin  admin     801 Mar  2 11:26 mod_wsgi.lai
    -rw-r--r--   1 birkin  admin  817100 Mar  2 11:26 mod_wsgi.o
    -rwxr-xr-x   1 birkin  admin  633872 Mar  2 11:26 mod_wsgi.so*
    birkinbox:mod_wsgi-2.3 birkin$
    

    Looks good.

  • Install

    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ sudo make install
    Password:
    /usr/sbin/apxs -i -S LIBEXECDIR=/usr/libexec/apache2 -n 'mod_wsgi' mod_wsgi.la
    /usr/share/httpd/build/instdso.sh SH_LIBTOOL='/usr/share/apr-1/build-1/libtool' mod_wsgi.la /usr/libexec/apache2
    /usr/share/apr-1/build-1/libtool --mode=install cp mod_wsgi.la /usr/libexec/apache2/
    cp .libs/mod_wsgi.so /usr/libexec/apache2/mod_wsgi.so
    cp .libs/mod_wsgi.lai /usr/libexec/apache2/mod_wsgi.la
    cp .libs/mod_wsgi.a /usr/libexec/apache2/mod_wsgi.a
    ranlib /usr/libexec/apache2/mod_wsgi.a
    chmod 644 /usr/libexec/apache2/mod_wsgi.a
    ----------------------------------------------------------------------
    Libraries have been installed in:
       /usr/libexec/apache2
    
    If you ever happen to want to link against installed libraries
    in a given directory, LIBDIR, you must either use libtool, and
    specify the full pathname of the library, or use the `-LLIBDIR'
    flag during linking and do at least one of the following:
       - add LIBDIR to the `DYLD_LIBRARY_PATH' environment variable
         during execution
    
    See any operating system documentation about shared libraries for
    more information, such as the ld(1) and ld.so(8) manual pages.
    ----------------------------------------------------------------------
    chmod 755 /usr/libexec/apache2/mod_wsgi.so
    birkinbox:mod_wsgi-2.3 birkin$
    

    Confirm the mod_wsgi.so file has been installed in '/usr/libexec/apache2', as specified in this line of my Makefile:

    LIBEXECDIR = /usr/libexec/apache2
    
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ ls -alF /usr/libexec/apache2
    total 81488
    drwxr-xr-x  72 root  wheel      2448 Mar  2 11:35 ./
    drwxr-xr-x  93 root  wheel      3162 Feb 12 16:44 ../
    (...)
    -rwxr-xr-x   1 root  wheel    633872 Mar  2 11:35 mod_wsgi.so*
    birkinbox:mod_wsgi-2.3 birkin$
    

    Nice, there it is at the bottom.

  • Load module into apache

    I can never remember exactly where the httpd.conf file is.

    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ locate 'httpd.conf'
    (...)
    /private/etc/apache2/httpd.conf
    (...)
    birkinbox:mod_wsgi-2.3 birkin$
    

    First a backup:

    birkinbox:apache2 birkin$ sudo cp ./httpd.conf ./2009-03-02_httpd.conf
    

    Added to the LoadModule section:

    # Added 2009-03-02
    LoadModule wsgi_module libexec/apache2/mod_wsgi.so
    
  • Restart

    birkinbox:apache2 birkin$ sudo apachectl restart
    
  • Clean up

    birkinbox:apache2 birkin$ 
    birkinbox:apache2 birkin$ cd /Developer_3rd/mod_wsgi-2.3
    birkinbox:mod_wsgi-2.3 birkin$ 
    birkinbox:mod_wsgi-2.3 birkin$ make clean
    rm -rf .libs
    rm -f mod_wsgi.o mod_wsgi.la mod_wsgi.lo mod_wsgi.slo mod_wsgi.loT
    rm -f config.log config.status
    rm -rf autom4te.cache
    birkinbox:mod_wsgi-2.3 birkin$
    

Configure & test

  • Reference: 'Configuring An Application'. Docs recommend following this to verify that mod_wsgi is actually working properly.

  • Note: phpinfo() does indicate the module is loaded.

  • Note: docs state to follow these QuickConfiguration instructions before delving into the more thorough configuration docs.

  • Test app function & directories

    Created, per instructions, test function in a file and enclosing directory:

    birkinbox:repository birkin$ 
    birkinbox:repository birkin$ cd /Users/birkin/Documents/Brown_Library/ModWsgiTest 
    birkinbox:ModWsgiTest birkin$ 
    birkinbox:ModWsgiTest birkin$ ls -alF
    total 8
    drwxr-xr-x   3 birkin  staff   102 Mar  2 15:54 ./
    drwxr-xr-x  87 birkin  staff  2958 Mar  2 15:54 ../
    -rw-r--r--@  1 birkin  staff   277 Mar  2 15:43 mod_wsgi_test.wsgi
    birkinbox:ModWsgiTest birkin$ 
    birkinbox:ModWsgiTest birkin$ cat ./mod_wsgi_test.wsgi 
    def application(environ, start_response):
        status = '200 OK'
        output = 'Hello World!'
        response_headers = [('Content-type', 'text/plain'),
                            ('Content-Length', str(len(output)))]
        start_response(status, response_headers)
        return [output]
    birkinbox:ModWsgiTest birkin$
    
  • httpd.conf file changes

    The docs give some sample configuration:

    <VirtualHost *:80>
    
      ServerName www.example.com
      ServerAlias example.com
      ServerAdmin webmaster@example.com
    
      DocumentRoot /usr/local/www/documents
    
      <Directory /usr/local/www/documents>
        Order allow,deny
        Allow from all
      </Directory>
    
      WSGIScriptAlias /myapp /usr/local/www/wsgi-scripts/myapp.wsgi
    
      <Directory /usr/local/www/wsgi-scripts>
        Order allow,deny
        Allow from all
      </Directory>
    
    </VirtualHost>
    

    My try will be:

    <VirtualHost *:80>
    
      ServerName 127.0.0.1
      ServerAlias 127.0.0.1
      ServerAdmin birkin_diana@brown.edu
    
      DocumentRoot /Users/birkin/Sites
    
      <Directory /Users/birkin/Sites>
        Order allow,deny
        Allow from all
      </Directory>
    
      WSGIScriptAlias /myapp /Users/birkin/Documents/Brown_Library/ModWsgiTest/mod_wsgi_test.wsgi
    
      <Directory /Users/birkin/Documents/Brown_Library/ModWsgiTest>
        Order allow,deny
        Allow from all
      </Directory>
    
    </VirtualHost>
    
  • Backup

    birkinbox:~ birkin$ sudo cp /private/etc/apache2/httpd.conf /private/etc/apache2/2009-03-02b_httpd.conf
    
  • Make change and restart

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ sudo apachectl restart
    Password:
    birkinbox:~ birkin$
    

    Well, no errors on restart.

    Plain old 127.0.0.1 still yields the usual default page, which is good.

    Trying 'myapp'...

    birkinbox:~ birkin$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13) 
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    >>> import urllib
    >>> 
    >>> urllib.urlopen( 'http://127.0.0.1/myapp/' ).read()
    '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>403 Forbidden</title>\n</head><body>\n<h1>Forbidden</h1>\n<p>You don\'t have permission to access /myapp/\non this server.</p>\n</body></html>\n'
    >>>
    

    Ok... no permission, but it seems to recognize it as a valid web-address at least -- that's a start!

    Got it; problem is permissions on an enclosing folder...

    drwx------@  81 birkin  staff     2754 Feb  4 16:03 Documents/
    
  • Test app function & directories #2

    Follow instructions this time and create a fully-accessible directory:

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ mv /Users/birkin/Documents/Brown_Library/ModWsgiTest /Users/birkin/ModWsgiTest
    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ ls -alF /Users/birkin/
    total 1472
    (...)
    drwxr-xr-x    3 birkin  staff      102 Mar  2 17:03 ModWsgiTest/
    (...)
    birkinbox:~ birkin$
    

    Backup httpd.conf:

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ ls -alF /private/etc/apache2/
    total 256
    (...)
    -rw-r--r--    1 root  wheel  17614 Mar  1  2008 2008-03-02_httpd.conf
    -rw-r--r--    1 root  wheel  17613 Mar  2 11:55 2009-03-02_httpd.conf
    -rw-r--r--    1 root  wheel  17685 Mar  2 16:59 2009-03-02b_httpd.conf
    (...)
    -rw-r--r--    1 root  wheel  18156 Mar  2 17:08 httpd.conf
    (...)
    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ sudo cp /private/etc/apache2/httpd.conf /private/etc/apache2/2009-03-02c_httpd.conf
    Password:
    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ ls -alF /private/etc/apache2/
    total 296
    (...)
    -rw-r--r--    1 root  wheel  17614 Mar  1  2008 2008-03-02_httpd.conf
    -rw-r--r--    1 root  wheel  17613 Mar  2 11:55 2009-03-02_httpd.conf
    -rw-r--r--    1 root  wheel  17685 Mar  2 16:59 2009-03-02b_httpd.conf
    -rw-r--r--    1 root  wheel  18156 Mar  2 18:40 2009-03-02c_httpd.conf
    (...)
    -rw-r--r--    1 root  wheel  18156 Mar  2 17:08 httpd.conf
    (...)
    birkinbox:~ birkin$
    

    Change httpd.conf; section now is:

    <VirtualHost *:80>
    
        ServerName 127.0.0.1
        ServerAlias 127.0.0.1
        ServerAdmin birkin_diana@brown.edu
    
        DocumentRoot /Users/birkin/Sites
    
        <Directory /Users/birkin/Sites>
            Order allow,deny
            Allow from all
        </Directory>
    
        WSGIScriptAlias /myapp /Users/birkin/ModWsgiTest/mod_wsgi_test.wsgi
    
        <Directory /Users/birkin/ModWsgiTest>
            Order allow,deny
            Allow from all
        </Directory>
    
    </VirtualHost>
    
  • Restart & test

    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ sudo apachectl restart
    birkinbox:~ birkin$ 
    birkinbox:~ birkin$ python
    Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13) 
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    >>> import urllib
    >>> 
    >>> urllib.urlopen( 'http://127.0.0.1/myapp/' ).read()
    'Hello World!'
    >>>
    

    Success!!!

Nice, lightweight SOA implementation

sunday, may 18th, 2008 4:17pm

I've evangelized service-oriented architecture (SOA) before.

To review, briefly and roughly: SOA promotes decoupled services. For example, a Fahrenheit-to-Celsius converter would likely be implemented as a web-service, instead of as a function/method embedded/tied into some bigger program. The benefits of this are multiple: 1) The service can be written in any programming language, and accessed by other services written in different languages. 2) SOA makes the idealized promise of code-reuse a reality.

I have a programmer friend who works for a large corporation who is familiar with implementing SOA using industrial-scale best-practices; I'm familiar with implementing it in a lightweight, seat-of-the-pants fashion.

Over the past year+ I've created well over a dozen or so SOA web-services for different projects. But I recently implemented one I put some best-practice effort into that'll be a model for my future SOA work. Some links:

What I like about this one...

  • The api urls offer 'discovery' via embedding, in the built-in returned data, contact and documentation information. Having just one of these pieces of info would be great; having both is particularly nice because web urls and staff change over time. Why is this useful? If someone is looking at the code that calls this service 5 years from now, and if I'm not around, the documentation will provide info on some extra features of the service that otherwise wouldn't be apparent if, say, the web-service just returned the word 'English'

  • The api urls are 'hackable', another way of enhancing discovery. One can intuitively try entering a code other than 'enk' to see what comes up (like 'tlh'). Also, reasonably appropriate things happen if one lops off increasing sections of the url (in this case, redirects to documentation pages).

  • The api urls are versioned. Key:value pairs can be added to this api -- but the existing key:value pairs must never be changed. The reason is that post-release, I don't know who's using it for what, thus I have to assume any changes could break someone's app. So if I want to change the label 'response' to 'language', and deliver it in xml, I can leave the existing one as is, and label the new one 'api_v2'.

  • All these urls utilize server-caching. This is an implementation rather than a design feature, but worth mentioning. Django offers a flexible and easy-to-use caching feature; I have it set so that the list and api urls only have to hit the database once a day, no matter how many times the urls are hit. Further, django's caching is intelligent: its response includes 'Cache-Control', 'Etag', and 'Expires' http-headers so that a browser or well-designed code doesn't even have to call the web-service again to redisplay the data. Nice. This would be particularly important and useful for something like RSS feeds.

Good info...

  • A terrific, hands-on review-resource on http-headers: The web-services chapter of Mark Pilgrims 'Dive Into Python' website & book.

  • Many of the features of this language_translator web-service were informed by the book 'RESTful Web Services', by Richardson & Ruby. Some parts are a bit dense, but it's chock-full of terrific detailed info and food for thought. I came across it after having written a half-dozen or so SOA web-services, each one a little different and better, and it directly addressed many issues I had begun to think about or saw referenced via web-research.

[Acknowledgements to Peter Murray's article and Richard Akerman's Access_2006 presentation that first inspired my SOA thinking.]

Appreciating django templates

sunday, april 6th, 2008 10:22pm

I'm loving Django templates.

In setting up a site for my bookgroup, I used a model for 'Meeting' that has a 'meeting_date' defined as a 'DateTimeField'. For those who haven't yet tasted the Django kool-aid, that's a field that requires both a valid date and a valid time. Made sense to me, since one of the main reasons for the site is to be able to list the location and date & time of upcoming meetings. However, when entering a few old meetings, the spreadsheet I was working from only listed the month and year (November 1997! -- we've been around for a while!).

There were a couple of ways I could have handled this. What I chose to do was to add two boolean fields: 'fake_date' (meaning 'day') and 'fake_time'. It might have been cleaner to allow we admins to have a year, a month, a date, and a time field -- but I knew going forward the single DateTimeField would work for us and I wanted to build more for the future than the past. So, when entering an old meeting with just a month and year, any date in that month is entered and any time of day, and both fake_date and fake_time are checked.

The complete DateTime object is passed to the template, and then some logic goes to work:

{% if meeting.fake_date %}
    <h2>{{ meeting.date_time|date:"F, Y"|lower }}</h2>
{% else %}
    {% if meeting.fake_time %}
        <h2>{{ meeting.date_time|date:"l, F jS, Y"|lower }}</h2>
    {% endif %}
{% endif %}

{% if not meeting.fake_date and not meeting.fake_time %}
    <h2>{{ meeting.date_time|date:"l, F jS, Y g:iA"|lower }}</h2>
{% endif %}

You can see the results of the non-fake dates here, and the result of the fake dates here (the older meetings) -- the same kind of date-object reaches the template; the logic above handles the presentation.

I could have more efficiently handled the 'happy-path' real-date case via nesting, but I find this a bit more readable.

If the first test matches, the DateTime object info will only show, as an example, 'march 1998'; if the second test matches, the same DateTime object will only show 'march 6, 1998', but not the time.

This is nice. My introduction to templates was via JSPs, using expression-language to pass in values from beans. Since pure java code can be embedded in JSPs, I had trained myself to rigidly keep logic out of templates, and in the above situation would have written that logic within a Java class. When I began working more with php, I looked around for a template system. I had heard good things about 'smarty', but it seemed too heavyweight. That, combined with my fierce aversion to template logic, scared me off. I then attended a wonderful presentation on HTML_Template_ITX, was sold whole-hog, and still use that for my php end-user web work.

What I initially loved about Django's templates is that I didn't have to use any of the logical conditions I show above; it can be used very well very simply. As I've grown more comfortable with Django and python, my philosophical aversion to template-logic has gradually evaporated -- as long as it's presentation logic. The situation above is a perfect example: It's very reasonable for the business-logic end of things to pass to the presentation-layer a date. How that date is then formatted (upper or lowercase, whether or not the day or other elements of the date are shown, etc.) is a very reasonable thing for the template to handle. And that the template can also handle the presentation based on certain conditions of the Meeting instance is very, very, nice.

Better logging

saturday, february 23rd, 2008 5:22pm

I'm entranced with a new practice: logging to a database instead of a file.

Long ago I got into a habit of logging to files as a way of monitoring the workings of my programs. For shell scripts I piped the standard output to be appended to a file, and then just sort of stuck with that model as I learned other languages.

Though java and python each have a robust logging library built into the language, I didn't use those, instead focusing my language-learning on features that more directly enabled me to tackle whatever the task at hand. The result is that over time my shell and php and python and java programs ended up with log files that grew ever larger, requiring occasional manual paring.

Given an interest in best-practices, I've begun learning about and experimenting with built-in loggers when available, but on a current project have met my logging needs via a self-rolled approach that offers real benefits.

Problem -- atomized logging

Our easyBorrow project consists of a lightweight php web interface that quickly dumps the incoming request into a database queue, where a python controller takes over, calling a series of independent web tunnelers & other web services. The whole system consists of around a dozen independent web-services of varying degrees of complexity, each with a nicely scoped focus. Most of them also write to a separate log file, which in a way makes sense, but given that the majority of these web-services serve a single goal -- to move the user's request-processing along, the atomized nature of the logs can end up being a hassle.

If something goes wrong with a request, a 'history' table does given an indication of where to start tracking down the issue -- but then I may have to look into as many as half-a-dozen separate log files to see what exactly happened. This is one of those situations where problems don't arise often enough to tackle improving the existing architecture, but just enough to make the existing one annoying at times.

Problem -- data not exposed

Keeping this background in mind, I want to note another issue that happens maybe once every three months that had a role in this new architecture with which I'm so taken. Some two years ago I implemented an automated export of requests from our iii ILS for items held in an offsite location. Those requests get exported, then parsed, and then moved to a location where a different vendor's inventory-control software takes over and presents the workers at the offsite facility a list of items that need to be retrieved.

Occasionally, very occasionally, requests don't show up for the offsite staff and I'm asked if I can confirm that the requests actually got exported and parsed and handed off to the inventory-control software. So I look in my documentation to see where the server application log files are located -- grab them and let the folk know that yes, in fact, my part of the flow worked. When this happened last month, a co-worker noted that it would be terrific if they could view the information that I'm looking up so I wouldn't have to be bothered. Unfortunately, given the existing model, that would require folk having passwords to unix servers and isn't workable. But I've ruminated upon this, and given my current evangelism of APIs and exposing data, I've thought that if I had to do that logging over, I'd expose it via a web interface.

Solution

Now I'm working on a new project, or rather tackling one that's been on the back-burner far too long: exporting newly-cataloged item information from our closed and unfriendly iii ILS into a database where we can present users with useful new-item info and feeds. Like more and more projects these days, this one has many pieces, each of which, had I done this a year-plus ago, would have logged its inner workings to a separate file.

Now though, I'm logging the export script info, the posting script info, and the parsing script info to a single database table. And because one of the scripts lives on a server that doesn't have a library setup to interface with mysql, I'm 'writing' to the db by POSTing that script's log-entry info to a url which then saves it to the db. The log-table consists of (in addition to an unseen auto-incrementing id) a datestamp, an identifier, and the log message. The 'identifier' in this case is a simple number that allows me to group the entries from different sources together in the log. When I eventually apply this beauteous system back to easyBorrow, the identifier will be the request-number the system assigns early on in the process. The function/method in each separate script that writes to the log also takes a detail-level parameter, allowing me to specify a high-level of logging detail in development code, and a low-level in the ongoing in-production code.

This system is sweet. It means that I have only to look in one place to monitor the flow of all three scripts. So if the export cron job fires off at 2am, and the POST cron job fires off at 3am, and the parser cron job fires off at 4am, I can see the whole flow in one view.

Though all developers can write to a database in their sleep, since I'm writing to a django-managed table, it is and feels even easier. For those who haven't yet drunk framework kool-aid:

log_entry = Log()
log_entry.identifier = 'the_identifier'
log_entry.message = 'the_message'
log_entry.save()

Wrapping a function around this allows my log entries to look like:

updateLog( detail='low_detail', identifier='the_identifier', message='the_message' )

But wait, in true Ronco spirit, it gets better... Since I'm writing to a django-managed table, I automatically, without writing extra code, have a complete, useful, sortable and searchable web-interface -- with built-in authentication -- which means that not only can I view the flow of processing, I can easily allow anyone else to view the flow of processing by supplying a url.

The final sprinkle of luscious magic is that django makes it very easy to overwrite the built in save method of its objects. So I've added a bit of logic to the Log object's save method to delete entries older than X days (a configurable number I've put in a settings file). There's a bit of a hack in this solution. The absolute simplest code to write in this save method is just to query for all log-entries older than X days and delete them, which is what I've initially done. But this is unnecessarily expensive database access for every single log-entry, though mitigated by the fact that for this project, the scripts run only once a day and in production, log lightly. A better approach would be to have a separate job run once a day or week and perform the deletions, and I may implement that, though I've been mentally toying with an oddly enjoyable interim hack: to have the save method come up a random number such that it would have, say, a 1 in a 100 chance of running the delete code. Bottom line, though, is that auto-deletion is taken care of right up front.

Put all of these improvements together, and the new system offers more useful, more accessible, and better-sized logging.