Quantcast
Channel: Coding the Architecture
Viewing all articles
Browse latest Browse all 230

Data Integrity and System Design

$
0
0

Having been involved in several upgrade projects over the last few years, one thing I've often noticed is the poor quality of data that can be present in a large and long running system. This can present problems for upgrading and usually means that you have to spend quite some time fixing the data first.

Upgrading is difficult and causes regression tests to fail as:

  • The new system may have more data checking and refuse the old data.
  • The new system may be more precise e.g. not rounding a number, taking a sign into account etc.
  • The new system may be using data that was not used before e.g. data entry staff not bothering to enter a product's weight into a form as 'it is never used'.
  • Years of copy and paste can leave a vast amount of junk that fail consistency checks.

    After you have corrected the data for upgrade, the original system has much higher quality data and other issues and inconsistencies have been solved. In a recent system we also saw large performance improvements due to duplicate and junk data being removed. On another system we saved the operations staff may hours work a week as the data improvements meant a large number of post report corrections were no longer needed.

    So why isn't this analysis done on a regular basis to help keep a system healthy? The main reason is simply that it's just too hard for the operations staff to do. Therefore when you're designing a system you should take this into account and enable these kinds of maintenance tasks. This involves reporting and having tools that can correct sets of problematic data.

    Some things to consider:

  • How easy is it to identity and delete orphaned data i.e. If you can't get to some data is it required?
  • Can a user identity data that has not been used for a long time? Can they then archive it?
  • Can you identify identical or similar data? A common example is user information that differs only by capitalisation e.g. an address.
  • Can the user run arbitrary consistency checks that go beyond the database rules? E.g. I've recently written a tool to allow an operations manager to run xpaths over data to check for bad bookings.
  • Can the user bulk load sets of missing or corrected data?

    Please don't rely on database tools to do this as your operational staff probably won't know how to use them and your DBAs don't understand the business domain to analyse the data. You need tools at the appropriate level for the appropriate people and consider the complete lifecycle of your product.


  • Viewing all articles
    Browse latest Browse all 230

    Trending Articles