Today's idea for the blog regards inching towards normalcy...that is, the main three normal forms of database design. I always get a little fuzzy in the head when I think about the precise official definitions of the three, and generally examples are the best way I have of comprehending normalization.
So let's start out with a sample database. As this is my blog, and the apparently absent readership has put up no protestations, I shall choose Hindustani classical music as my subject; a database of ragas.
Raga |
Name |
Thaat |
Mode |
Performance1 |
Performance2 |
Yaman |
Kalyan |
Lydian |
Ustad Imrat Khan |
Hariprasad Chaurasia |
Darbari |
Asavari |
Aeolian |
Ustad Vilayat Khan |
Dagar Brothers |
Chandni Kedar |
Kalyan |
Lydian |
Ustad Vilayat Khan |
Ustad Amir Khan |
So we can see from this that we've got a record of each raga, listing what thaat each is classified into, what Western mode that thaat is equivalent to, and a couple example noteworthy performances of each raga. The main problem here that prevents us from entering the first normal form (1NF) is the multiple values we have for example performances. We could have issues if there were more or less than two performances to be provided for a raga, for instance. So to get it to first normal form, we combine the performance columns into one column, since they are storing essentially the same information.
Raga |
Name |
Thaat |
Mode |
Performance |
Yaman |
Kalyan |
Lydian |
Ustad Imrat Khan |
Yaman |
Kalyan |
Lydian |
Hariprasad Chaurasia |
Darbari |
Asavari |
Aeolian |
Ustad Vilayat Khan |
Darbari |
Asavari |
Aeolian |
Dagar Brothers |
Chandni Kedar |
Kalyan |
Lydian |
Ustad Vilayat Khan |
Chandni Kedar |
Kalyan |
Lydian |
Ustad Amir Khan |
Alright, so now we have it in first normal form. However, we doubled our row count now, and seem to be storing a lot of duplicate information in our table. Since the relationship between a raga and a performance is one to many, we need to split out the performances onto their own table in order to achieve the second normal form. A foreign key (between Raga.Name and Performance.Raga) can be used to connect the data across tables.
Raga |
Name |
Thaat |
Mode |
Yaman |
Kalyan |
Lydian |
Darbari |
Asavari |
Aeolian |
Chandni Kedar |
Kalyan |
Lydian |
Performance |
Raga |
Performer |
Yaman |
Ustad Imrat Khan |
Yaman |
Hariprasad Chaurasia |
Darbari |
Ustad Vilayat Khan |
Darbari |
Dagar Brothers |
Chandni Kedar |
Ustad Vilayat Khan |
Chandni Kedar |
Ustad Amir Khan |
Having achieved second normal form (2NF), we note one additional point of redundancy. Fields that do not depend on the primary key (in our case, the raga itself) must be "offshored". Note that Mode is essentially derived from Thaat; it isn't really a unique value of the raga, it is more a further description of the thaat quality. So one more time:
Raga |
Name |
Thaat |
Yaman |
Kalyan |
Darbari |
Asavari |
Chandni Kedar |
Kalyan |
Performance |
Raga |
Performer |
Yaman |
Ustad Imrat Khan |
Yaman |
Hariprasad Chaurasia |
Darbari |
Ustad Vilayat Khan |
Darbari |
Dagar Brothers |
Chandni Kedar |
Ustad Vilayat Khan |
Chandni Kedar |
Ustad Amir Khan |
Thaat |
Name |
Mode |
Kalyan |
Lydian |
Asavari |
Aeolian |
Hey presto, third normal form (3NF) in all its grandeur. It's a very clean and proper way to design databases, but working in the real world, I think it's worth remembering something stated by
Karen Lopez at the KC SQL User Group last August: "Normalization is a description, not an objective" (paraphrased). There are times when third normal form is ideal, times when denormalized data is the way to go. The classic IT/DBA mantra applies..."It Depends".
No comments:
Post a Comment