Data Integrity and System Design

Having been involved in several upgrade projects over the last few years, one thing I've often noticed is the poor quality of data that can be present in a large and long running system. This can present problems for upgrading and usually means that you have to spend quite some time fixing the data first.

Upgrading is difficult and causes regression tests to fail as:

The new system may have more data checking and refuse the old data.

The new system may be more precise e.g. not rounding a number, taking a sign into account etc.

The new system may be using data that was not used before e.g. data entry staff not bothering to enter a product's weight into a form as 'it is never used'.

Years of copy and paste can leave a vast amount of junk that fail consistency checks.

After you have corrected the data for upgrade, the original system has much higher quality data and other issues and inconsistencies have been solved. In a recent system we also saw large performance improvements due to duplicate and junk data being removed. On another system we saved the operations staff may hours work a week as the data improvements meant a large number of post report corrections were no longer needed.

So why isn't this analysis done on a regular basis to help keep a system healthy? The main reason is simply that it's just too hard for the operations staff to do. Therefore when you're designing a system you should take this into account and enable these kinds of maintenance tasks. This involves reporting and having tools that can correct sets of problematic data.

Some things to consider:

How easy is it to identity and delete orphaned data i.e. If you can't get to some data is it required?

Can a user identity data that has not been used for a long time? Can they then archive it?

Can you identify identical or similar data? A common example is user information that differs only by capitalisation e.g. an address.

Can the user run arbitrary consistency checks that go beyond the database rules? E.g. I've recently written a tool to allow an operations manager to run xpaths over data to check for bad bookings.

Can the user bulk load sets of missing or corrected data?

Please don't rely on database tools to do this as your operational staff probably won't know how to use them and your DBAs don't understand the business domain to analyse the data. You need tools at the appropriate level for the appropriate people and consider the complete lifecycle of your product.

Data Integrity and System Design

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112