24 October 2012

Becoming Slightly More Normal

Today's idea for the blog regards inching towards normalcy...that is, the main three normal forms of database design.  I always get a little fuzzy in the head when I think about the precise official definitions of the three, and generally examples are the best way I have of comprehending normalization.

So let's start out with a sample database.  As this is my blog, and the apparently absent readership has put up no protestations, I shall choose Hindustani classical music as my subject; a database of ragas.

Raga
Name Thaat Mode Performance1 Performance2
Yaman Kalyan Lydian Ustad Imrat Khan Hariprasad Chaurasia
Darbari Asavari Aeolian Ustad Vilayat Khan Dagar Brothers
Chandni Kedar Kalyan Lydian Ustad Vilayat Khan Ustad Amir Khan

So we can see from this that we've got a record of each raga, listing what thaat each is classified into, what Western mode that thaat is equivalent to, and a couple example noteworthy performances of each raga.  The main problem here that prevents us from entering the first normal form (1NF) is the multiple values we have for example performances.  We could have issues if there were more or less than two performances to be provided for a raga, for instance.  So to get it to first normal form, we combine the performance columns into one column, since they are storing essentially the same information.

Raga
Name Thaat Mode Performance
Yaman Kalyan Lydian Ustad Imrat Khan
Yaman Kalyan Lydian Hariprasad Chaurasia
Darbari Asavari Aeolian Ustad Vilayat Khan
Darbari Asavari Aeolian Dagar Brothers
Chandni Kedar Kalyan Lydian Ustad Vilayat Khan
Chandni Kedar Kalyan Lydian Ustad Amir Khan

Alright, so now we have it in first normal form. However, we doubled our row count now, and seem to be storing a lot of duplicate information in our table. Since the relationship between a raga and a performance is one to many, we need to split out the performances onto their own table in order to achieve the second normal form. A foreign key (between Raga.Name and Performance.Raga) can be used to connect the data across tables.
Raga
Name Thaat Mode
Yaman Kalyan Lydian
Darbari Asavari Aeolian
Chandni Kedar Kalyan Lydian


Performance
Raga Performer
Yaman Ustad Imrat Khan
Yaman Hariprasad Chaurasia
Darbari Ustad Vilayat Khan
Darbari Dagar Brothers
Chandni Kedar Ustad Vilayat Khan
Chandni Kedar Ustad Amir Khan

Having achieved second normal form (2NF), we note one additional point of redundancy. Fields that do not depend on the primary key (in our case, the raga itself) must be "offshored". Note that Mode is essentially derived from Thaat; it isn't really a unique value of the raga, it is more a further description of the thaat quality. So one more time:
Raga
Name Thaat
Yaman Kalyan
Darbari Asavari
Chandni Kedar Kalyan


Performance
Raga Performer
Yaman Ustad Imrat Khan
Yaman Hariprasad Chaurasia
Darbari Ustad Vilayat Khan
Darbari Dagar Brothers
Chandni Kedar Ustad Vilayat Khan
Chandni Kedar Ustad Amir Khan

Thaat
Name Mode
Kalyan Lydian
Asavari Aeolian

Hey presto, third normal form (3NF) in all its grandeur. It's a very clean and proper way to design databases, but working in the real world, I think it's worth remembering something stated by Karen Lopez at the KC SQL User Group last August: "Normalization is a description, not an objective" (paraphrased). There are times when third normal form is ideal, times when denormalized data is the way to go. The classic IT/DBA mantra applies..."It Depends".

No comments: