Tuesday, September 29, 2009

Database Design Insights

As I mentioned before I have been spending some time rethinking some of the designs I have done or have seen over my career. I was able to come up with a few insights.

First, not enough companies keep history of who changed records, when, and what the changes were. Keeping this sort of information is critical to reconstructing why a decision was made. You need to be able to tell the state of the database at a given date and time to know why a decision was made the way it was. Another bit of information that is needed for sensitive data is who viewed the data and when. None of the companies I have worked for have ever kept a log of select statements.

Another insight was that most tables fall into basic categories. I usually hate putting things in boxes but had to create these as a way to explain table design to developers. Reference tables contain information that is fairly basic such as units of measure or model years. These tables are rarely updated but often read. Cross reference tables are used to translate one piece of information into another (employee 276 in the old HR system is employee 889 in the new HR system). These tables are again rarely updated and are less commonly read. Master tables contain information about customers, parts, suppliers, etc. These tables are updated fairly often and read fairly often. Transaction tables contain information about invoices, orders, and other event driven processes. These tables are added to constantly and often read.

I spent a lot of time thinking about dates. Dates are often represented with times attached, which is fine if you know precisely when something happened. The aren't as convenient when you want to record a birth that happened in the spring of 1847 or an political movement that occured in the later half of the 1500's. I have found a few ways to represent non-exact dates but they are alien to the database systems we have and therefore not a clean solution. I need to do some more work on this.

While thinking about imprecise dates I considered imprecise data. Suppose you have multiple reports of when an event happened or multiple descriptions of a bank robber? Every system I have dealt with expects one precise record, but that isn't how the real world works. I was able to come up with a system that allows for multiple versions of an event or fact, including a reliability factor. But I need to work on this more and haven't decided how multiple versions of a fact can be accurately reflected on a report.

So, nothing earth-shattering. No cures for cancer. And perhaps I am the only one who cares about such things. Still, here are my thoughts.

No comments: