Many things change with the decision to work with purely object-oriented data in a specific situation. The outlook seems good: business processes and rules will be much easier to implement, completely typed data will be no problem at all and there’ll be no more structural problems trying to accomodate clumsy handling of records and rows in an otherwise OO application structure. There’ll be an object/relational mapping tool that takes care of all the persistence issues. There’s one thing though that will pose problems much greater than originally anticipated, and it’s easy to overlook large parts of that in the original decision: the wide topic of data integrity in the object world (OW). I’m going to present some general questions and theories about data integrity in conjunction with OO data objects in this article and I’m planning to write some further articles on the same topic later. Occasionally I may reference the technology I’m personally using at the moment, which is .NET 2, the C# language and XPO.
First of all, data integrity comes in two flavours:
Many aspects of both data integrity parts are very different in the OW, compared to a “simple” relational database model.
People have been thinking about these issues for the relational database scenario for a long time. Concepts like referential integrity and unique indexes are very important in this domain. Normalisation provides for database configurations where automated referential integrity can be fully exploited. Databases have features that let the designer restrict values, together with modern database access layers like ADO.NET the complete technical side of data integrity and possibly some of the logical part is covered by these mechanisms. In the OW, this is where the problems start.
Obviously, a good o/r mapping tool should be able to exploit the features of the database layer, but this isn’t sufficient. As soon as a single object is mapped to more than one table (as it should be, when inheritance is used), many of these mechanisms break. For example, it’s impossible to define a multi-value index, unique or not, for values that are not in the same table. Depending on implementation details of the O/R mapper, even unique indexes over fields that are in the same table may present problems, for example if the mapper doesn’t make sure that all necessary fields of an object are filled (correctly!) before the object is first saved to the database.
When working directly with relational databases, the easy way to implement business logic is on the server side. Using triggers on the database level, consistency checks can be implemented, other processes executed just in time and so on. Unfortunately, this has a lot of drawbacks; one of the worst is that there’s no easy way to get useful user feedback in case a check fails. In real-world applications, more often than not business logic implementations will be split, performing some kinds of actions on the database level while leaving other stuff to the client application. For the latter part, it’s difficult to find the right “place” where to implement it, in .NET a typed dataset can provide part of a useful answer.
As long as there’s any server-side consistency checking implemented, there’s always the problem that data which is already loaded on the client, and has been changed there, may not adhere to the same restrictions the server would enforce if the data was to be saved. The programmer has to keep an eye on the exact state of things and see to it that data is saved to the server in all the right places.
There are two aspects to this issue: First, an O/R mapping tool should be allowed to define it’s own database structure with as much freedom as possible. (I know that a lot of people think that this should work the other way round, letting them define a structure and leave the tool to deal with that. Apart from situations where one needs to work with legacy data structures, this is nonsense to me and contorts the purpose of such tools.) Obviously, I’d have to be very careful when writing database layer code that successfully uses the information in the generated layout, and I’d risk breakage every time I update the tool.
Second, from the OW point of view, it seems intolerable to have a number of objects in memory at any given time that may not be in a consistent state. With relational data, this is often a situation that’s simply left to the developer of each distinct algorithm. But when objects are global to the application (or parts of it, at least) and there are intelligent caching and lifecycle management mechanisms in place, as implemented by a useful O/R mapper, one can’t live with the possibility of inconsistent states in in-memory data.
So, these are (some of) the specific issues we have to deal with in the OW:
These issues and their solutions will be subjects of future posts. Thanks for reading so far!
Sorry, this blog does not support comments. Why not?
I used various blog hosting services since this blog was established in 2005, but unfortunately they turned out to be unreliable in the long term and comment threads were lost in unavoidable transitions. At this time I don't want to enable third-party services for comments since it has become obvious in recent years that these providers invariably monetize information about their visitors and users.
Please use the links in the page footer to get in touch with me. I'm available for conversations on Keybase, Matrix, Mastodon or Twitter, as well as via email.