Nice, lightweight SOA implementation

sunday, may 18th, 2008 4:17pm

I've evangelized service-oriented architecture (SOA) before.

To review, briefly and roughly: SOA promotes decoupled services. For example, a Fahrenheit-to-Celsius converter would likely be implemented as a web-service, instead of as a function/method embedded/tied into some bigger program. The benefits of this are multiple: 1) The service can be written in any programming language, and accessed by other services written in different languages. 2) SOA makes the idealized promise of code-reuse a reality.

I have a programmer friend who works for a large corporation who is familiar with implementing SOA using industrial-scale best-practices; I'm familiar with implementing it in a lightweight, seat-of-the-pants fashion.

Over the past year+ I've created well over a dozen or so SOA web-services for different projects. But I recently implemented one I put some best-practice effort into that'll be a model for my future SOA work. Some links:

What I like about this one...

  • The api urls offer 'discovery' via embedding, in the built-in returned data, contact and documentation information. Having just one of these pieces of info would be great; having both is particularly nice because web urls and staff change over time. Why is this useful? If someone is looking at the code that calls this service 5 years from now, and if I'm not around, the documentation will provide info on some extra features of the service that otherwise wouldn't be apparent if, say, the web-service just returned the word 'English'

  • The api urls are 'hackable', another way of enhancing discovery. One can intuitively try entering a code other than 'enk' to see what comes up (like 'tlh'). Also, reasonably appropriate things happen if one lops off increasing sections of the url (in this case, redirects to documentation pages).

  • The api urls are versioned. Key:value pairs can be added to this api -- but the existing key:value pairs must never be changed. The reason is that post-release, I don't know who's using it for what, thus I have to assume any changes could break someone's app. So if I want to change the label 'response' to 'language', and deliver it in xml, I can leave the existing one as is, and label the new one 'api_v2'.

  • All these urls utilize server-caching. This is an implementation rather than a design feature, but worth mentioning. Django offers a flexible and easy-to-use caching feature; I have it set so that the list and api urls only have to hit the database once a day, no matter how many times the urls are hit. Further, django's caching is intelligent: its response includes 'Cache-Control', 'Etag', and 'Expires' http-headers so that a browser or well-designed code doesn't even have to call the web-service again to redisplay the data. Nice. This would be particularly important and useful for something like RSS feeds.

Good info...

  • A terrific, hands-on review-resource on http-headers: The web-services chapter of Mark Pilgrims 'Dive Into Python' website & book.

  • Many of the features of this language_translator web-service were informed by the book 'RESTful Web Services', by Richardson & Ruby. Some parts are a bit dense, but it's chock-full of terrific detailed info and food for thought. I came across it after having written a half-dozen or so SOA web-services, each one a little different and better, and it directly addressed many issues I had begun to think about or saw referenced via web-research.

[Acknowledgements to Peter Murray's article and Richard Akerman's Access_2006 presentation that first inspired my SOA thinking.]

Practical campus APIs & feeds

saturday, february 16th, 2008 10:42am

For a while now I've evangelized APIs & feeds, encouraging folk (and reminding myself) to to expose 'web-page' data by presenting it in some alternate structured format. That's partly for the purpose of making code-reuse a reality but even more-so for the purpose of making possible new and interesting uses of data.

At the Library, we've truly moved into the realm of moving code onto the network. The web-services we've created have, not surprisingly, been library-related:

  • An isbn converter.
  • A 'cleaner' for data output from an ILS API.
  • Many 'tunnelers' into consortial borrowing services returning results of searches, with the order number, if applicable.
  • A reprocessor of OCLC xISBN data that returns a only those OCLC xISBNS that have the same format and are in the same language as the submitted ISBN.
  • An OCLC to ISBN converter that will take an OCLC number and see if there are versions of that item available with ISBNs.
  • An OPAC status & location checker.
  • etc. etc. etc.

I've wondered recently what APIs the library could offer that would be of value campus-wide. More specifically, what APIs we might develop for our own needs that would be useful to the campus as a whole. Of course, many of our APIs do currently benefit the wider campus community in that students, staff, and faculty across campus use services of ours that are made possible via our behind-the-scenes use of APIs. I'm thinking more of APIs that developers in other departments might find directly useful.

When considering APIs that would be useful for developers across the campus, I naturally think of our Computing and Information Services department (CIS). I've had good conversations, and hope to have many more in the future, with CIS folk about having them develop and evangelize campus-wide APIs. My thinking has been that over time, developing such APIs could save them an enormous amount of time as well as enhance good will from departmental developers.

An example: for one of our Library projects, we need a listing of faculty and course information. I'm not directly involved in this project, but my understanding is that we periodically request a list of faculty and courses from CIS; they produce the list; and we update some db tables for web-apps that make use of this information. My sense is that if certain Banner APIs could be enabled -- obviously with appropriate security implemented -- we could get this information directly from a feed / API call, simplifying our workflow and lightening the workflow of the CIS folk who produce the list for us.

I'm encouraged from my conversations that there are folk in CIS who share this perspective and are working to realize it. While good discussions and planning proceed, I find myself gravitating to what we in the Library could do now along these lines. Three ideas...

Cafeteria menu

As part of an idea that deserves its own post (the idea sounds a bit silly without context, but indulge me), I've thought that it would be very useful on a particular Library web page to be able to display the next upcoming meal at the main campus cafeterias. I spent about ten minutes exploring the availability of that information, and found two web-pages and a downloadable excel spreadsheet. None of these are ideal sources of information to automate, but it could be done, and I wouldn't be at all surprised if the resulting structured feed would be of use to others, from individual students to the campus newspaper.

SafeRide arrival time

We have a campus shuttle system comprised of about seven vans. A couple of these have GPS receivers, and a vendor website displays on a map, via quite gnarly javascript, the current location of the GPS-enabled vans. That's nice, but the experience could be significantly improved.

I've thought it would be extremely useful to be able to display on a Library web-page (if the student is accessing the page from within the Library) a simple line like "The next SafeRide shuttle should arrive here in about 10 minutes." Simple and seriously, wonderfully useful information, that doesn't get in the way of the task at hand. That same Library web-page, if accessed from outside of the Library, simply wouldn't have that line displayed.

The API we could create, from parsing the javascript on the vendor web-page, could most simply at a minimum return location information for the GPS-enabled shuttles, which could be interpreted by our own server-side logic to approximate arrival times. But even better, the logic of determining arrival times could be embedded in the API itself. The API could take a location-parameter and return expected arrival time for the submitted location. We at the library might only implement logic that focuses on the arrival times at the Library. But by opening up the arrival-assessment code, we could allow BioMed developers to add to add arrival-time logic for shuttle-stop-locations close to BioMed buildings, and students to add arrival-time logic for shuttle-stop-locations close to particular dorms.

Since developers can determine the IP address of an incoming request for information, and since developers and computer-knowledgeable students know the IP-address ranges of buildings in their purview, we really can do this.

Public computer availability

Imagine you're a student. You need to get some good work done and know if you stay in your dorm room this evening you won't get that work done -- there are just too many distractions. So you want to go to the Library. You have a desktop computer, or maybe just don't feel like lugging your laptop in the rain, and you know the Libraries have public clusters. Problem is -- it's getting close to midterms and sometimes the clusters get pretty full. Wouldn't it be great, I mean, really, really great, to be able to access a web-page that shows public cluster availability across campus?

I've talked with some CIS folk about this and found individuals who are working hard to realize this goal. They do have software that can detect the 'in-use' status of each terminal in the clusters, and last I checked (in November, I think) had noted that the software had upgraded its web-display capability which with they were experimenting. However, public web display of cluster availability is as of this writing only accessible... from cluster machines. But the hope is that this information will eventually be made more public. That's great, but I'd like to take the data a step further, and create an API to the data. The reason is that if the data were also exposed via an API in addition to a web-page, I could solve more specific problems in a targeted way. For instance, one of our Libraries has 15 floors, with public computers available on multiple floors. Wouldn't it be terrific if a student entering that Library could glance at a display screen and see the relevant computer availability (with floor numbers listed instead of generic cluster IDs) for just that building? An API would allow that.

I have other ideas as well, but this gives a good flavor of how in the future, as we meet Library needs, we might be able to offer very useful API data to developers across campus.

To close, an exhortation... In each of these three situations, I speak of creating an API from existing publicly available electronic data. My excitement about creating and then utilizing these APIs for user-services is evident. But really, I should not have to create the APIs; I should be able to spend my time building the useful services for the Library and our campus that the APIs allow. So to all: if you know anyone creating any web-information -- encourage that person to expose their data not only via a 'regular' web-page, but also in a predictable structured way that can make its re-use easy. And to anyone purchasing any vendor-service that offers electronic information, demand that the service offers an API to the data.

Moving code onto the network

saturday, february 9th, 2008 10:46am

In 2004, while in my masters program, deeply immersed in java object-oriented programming, I saw the potential benefits of code re-use that classes offer. I envisioned over time building up libraries of class-objects; by accessing them in future projects, I expected to be more and more productive.

Code-reuse never quite worked out that way, though. What I've tended to do for new projects has been to copy a similar class from a previous project, paste it into the new project, weed out unnecessary attributes and methods, and add new code. In a way this makes sense: though I lose out on 'pure' code-reuse, I gain by having all code for a project together. That's nice for version-control and portability, and isolation of concerns in that I don't have to worry that a change in a class in one project will have unintended consequences in another project.

But reading a while back about service-oriented-architecture, and shortly thereafter having a need to code a couple of lines in python that I had just coded in php a day or two earlier -- the benefits of moving code into RESTful web-services, that is: moving it onto the network, became apparent.

I do that all the time now. Just last week I had a need to convert between 10 and 13-digit isbns -- for the second time in a recent project, so rather than coding the conversion directly in the program at hand I put it into a webservice.

http://sisko.services.brown.edu/easyborrow/isbn_converter/0688052304/

In this shift, I've finally realized that goal of code reuse, while still being able to maintain the version-control and isolation of concerns benefits of focusing on my specific project at hand.

The book 'RESTful Web Services' by Richardson and Ruby, while a bit dense, offers good insights on creating web-services (example: versioning). At some point, I'd like to come up with standards for Brown Library (and/or campus-wide) web-services. Examples: specifying versioning in the url, a documention url in the returned data, and a url in that documentation of all APIs/web-services the department offers.

For now, though, the simple shift toward moving code out of individual projects and onto the network has been extremely rewarding.

Vendor API Manifesto

thursday, december 6th, 2007 11:02pm

[I wrote this early in the Fall of 2007 and circulated it to folk in the Library who were attending meetings at which vendors were advertising their wares.]

Software products are created, understandably, primarily to meet existing needs. There are varied bodies of thought as to how much a software product should be designed to meet 'future possible needs'.

At certain points in recent history, it may have been reasonable to design the sole interface to a system assuming that the user of the interface would be a person using a web-browser.

Though APIs (application programming interface) have been around for ages, the trend toward programmers wanting to access internal and external systems via APIs has accelerated tremendously over the last few years.[1] As a programmer for a creative web-services department in a creative Library, I'm part of this trend. Our team's need to be able to programmatically access systems has increased dramatically. Fortunately, a few vendors such as Ex Libris understand this and have built possibilities for programmatic access into their products. But many closed systems remain.

To managers and directors making purchasing decisions, I urge that a top-level purchasing consideration be whether the vendor's product offers an API to the information it provides (in addition to any built-in web interface). The simple reason is that a web presentation of information is designed for a single purpose: for a user to interact with the system via a browser. An API allows the system's data to be accessed in any way we see fit, now or in the future.

A concrete example for any reader not familiar with the notion of an API...

Our team is currently developing a system to simplify the process of obtaining a book through interlibrary-loan services. In order to do this we have been able to automate the process of searching a consortial web-catalog for an item, and requesting that item. But the only method of doing this involves creating a program which essentially mimics a browser, automatically simulating clicked buttons and links and reading the resulting HTML of the consortial catalog's web interface.

This works, but is terribly fragile: if the design of a web-page changes, our program may no longer work until it is reconfigured to understand the new design.

What we absolutely need (in addition to the existing web interface) is a catalog-service (the API) which would allow a defined http request to be sent to a URL that will allow a search to be performed, or an item to be requested, etc. (That http request would come from a program our team has written -- instead of from a user sitting at a browser.) Each request to this API would return predictable documented structured information (XML is one standard; there are others). Our team's program would then be able to automatically process this information.

It is worth emphasizing that I am not asking for a 'whole new program' from the catalog vendor. A system's existing internal program logic that produces the information for the regular web data-stream is applicable to production of the alternate API data-stream. Yes, it takes thought and work to create a good and secure API and document it -- but an API, essentially, presents the same data as a web interface, in a simpler format. The mind-shift in offering an API is often larger than the work-shift.

Finally, about interacting with vendors regarding this issue... Vendor sales people aren't the developers, and it sounds like I am asking for something that vendor developers would be more knowledgeable about. But I've seen different vendor sales representatives at workshops and conferences, and the representatives for products that provide APIs have universally very clearly understood the importance of this issue. Thus if a product representative does not seem to understand this important feature, I would have significant concerns with the product.

--

Notes

[1] Key aspects of this trend are articulated in this seminal article:
http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html