Friday, February 25, 2005

Evolutionary database applications

Brian had a comment for clarification on Quote and database evolution that I thought I'd take the time to clarify. I said:

It's intriguing to me that the application I've worked on for going on four years as well as my own perception of what has worked well over that timeframe has followed a pretty similar evolution as what Brian describes in his rant about databases. which Brian asked:

In terms of what you did with it, or level of frustration at building things around the problem instead of solving problem?

I'd have to say what we did with it. I'm feeling the need to tell a story here, bear with me...

I feel like the application has turned out well from a technical standpoint precisely because we were in a position early on to build on the database without making it the focal point of the application. This is in contrast to some of the other applications in production at our company. One of the older applications is written almost entirely in PL/SQL, so there was an obvious database focus there and the development culture that sprung up around it was such that, for a while, the DBAs were king and the programmers had to answer to them.

It seems like a fluke that our app turned out the way it is. There were a lot of factors that played into this over time, many of them cultural/non-technical. Fortunately, early on we somehow had the leeway to build away for the database. Because we were almost starting greenfield, we didn't have to initially deal too much with the DBAs, since they were occupied with maintaining the production databases.

The app wasn't exactly greenfield because we were building off a newly-acquired codebase, but it still needed plenty of work. The codebase had a horrible proprietary persistence mechansim that included an infamous "magic container" table called MHASHTABLE that held all associative relationships for every object in the system (!). Because this would obviously not scale for a consumer-facing hosted commerce engine that had to be capable of serving tens of thousands of transactions per day (think of the rate of growth of that table!), we had to rip it all out and re-do the persistence layer. Basically anything we might come up with would probably be better than this beast, so perhaps we were left to just do our thing.

Our architect at the time was following persistence technologies trying to get an idea of what direction to go in, and JDO as a standard was just coming out (0.9 in the fall of 2001). The only viable alternatives were expensive, proprietary tools like TopLink. After some brief prototyping, we decided to build our own SQL JDO implementation.

One of the loose requirements that the DBAs did give is that they needed to be able to have control over the queries. They thought that they would get some sort of set of configuration files containing query templates that they could tweak to their hearts' content. Instead, we gave them JDO, with fetch groups and dynamic SQL generation. After they saw how well it worked, they didn't bother to complain about the lesser amount of control they got. The SQL, though dynamic, was predictable, uniform, and based upon primary keys or indexable "pseudo" foreign keys. (Pseudo- because we actually have no foreign key constraints at all on our tables for better insert performance and so we don't have to worry about proper ordering of deletes. Once, when giving an overview of our JDO-based persistence to a new DBA, her mind was blown away at that fact -- it apparently went against everything she learned about building databases!)

We were able to build the JDO implementation in an evolutionary way to match the growing, changing needs of the application. As a result, we don't quite have a full-featured, to-spec implementation, but we don't need it. There are quirks in the way we do things like complex subqueries or inheritance (I had no prior background in O/R mapping and basically learned on the fly), but we've got enough features (most of JDOQL, bytecode enhancement) to give an approximation of transparent persistence. In all my experience as a developer the amount of transparency we were able to create with our JDO implementation is probably one of the slicker things that I've had a hand in building. It was transparent enough to allow the rest of the development team to not have to worry about how to persist their objects, to have to remember what the proper way is to close a JDBC connection; but opaque enough so that we could performance tune it to our needs. There were some tough times when develpers would encounter a bug in the JDO engine that they'd have no idea what was going on, but today the codebase is so stable that it's hardly an issue.

(As a brief aside I still don't think I'd advocate building your own persistence engine, especially today with all the open source options available, but for us it was the right thing to do at the time and we were able to make it work well for us.)

So, with that long yarn I don't know if I completely answered your question Brian, but the main takeaway for me is that I've seen both sides of the fence, but I much prefer the side that I'm on with the database only playing a supporting role in the functioning of our app.

There's a thread in my mind somewhere here about the crucial importance of being able to do evolutionary application development (as opposed to big-bang or in-a-vacuum development) in which environment of course the database (and the "DeeBeeAies") must play well, but I'll leave that thought for another day.

10:43:58 PM      comment []  trackback []