I called this “Survival Normalization” because this is NOT
really going to be an in-depth discussion on normalization. For that, you may want to
check out the links at the end. What I am trying to show here is a an example
of taking the information on a Grocery Receipt (which is NOT normalized at all)
and create a data model that will work efficiently in an On-Line Transaction Processing
(OLTP) relational database.
There are some really good definitions out there. I like to start with an example. Look at this receipt.
Now there are few ideas you can get from this. First of all, whoever is the owner of this receipt really likes ROAST CHICKEN and Pop and is probably not the healthiest of individuals! (Or maybe they had a bunch of starving students coming over!)
Anyway, how would you STORE this information?
Let's just start with three facts; the department, item and price will get you really close to our first issue.
DEPT | ITEMS
DELI | Roast Chicken, 5.99, Roast Chicken, 5.99, ..
Hopefully, you can see this will be a problem. The ITEMS column is way to fluid! And can we really guarantee that every Roast Chicken is 5.99? It think most people will already jump to the conclusion that we would want to store this information this way -
DEPT | ITEMS |PRICE
DELI | Roast Chicken |5.99
DELI | Roast Chicken |5.99
DELI | Roast Chicken |5.99
We now have one row for the "lowest granularity" of the data on the receipt. The lowest granularity (detail) on the receipt is the item, not the DELI. So now we have one row for each item! We do not have any column with "repeating groups" (repeating information - ie. Roast chicken, Roast chicken, Roast chicken)
With this idea, we are now ready to tackle the entire receipt. If you want to follow along, I have provided the DDL needed to create a RECIEPTS table and link to Oracle Data Modeler at the end of this post.
Just translating the receipt into an entity is pretty straightforward. It would look like this.
(If you aren't sure what the "*" is in front of each item, that denotes it is mandatory. The information after the name describes what kind of data it is (VARCHAR for Variable Length Character information, for example)
Notice I have made everything mandatory. That is because all of this info is on the receipt I am looking at. It is an assumption and I would definitely want to check that out with a data expert. But, for now, in my perfectly controlled example, I am running with it.
What if my database had more than one receipt? How would I ensure each receipt is unique? First thing I would do look to see if there was a field (or combination of fields) I could use that would GUARANTEE uniqueness and always be populated. If one was present, I would assign that field (or those fields) the role of being the Unique Identifier (UID) (Or Primary Key depending on what stage you are at in your modeling!)
Guaranteeing that every row is unique is the first step in normalization and considered first normal form (1NF)
A closer look at RECEIPT shows something interesting.
Not everything is actually dependent on the UID! For example, the Store Address and Store Director are dependent on the StoreID. The Item Price is dependent on the Item ID. It like there are three different topics "smooshed" into one entity! We need to break this up so that every field/column/attribute is dependent on its own UID/PK.
Welcome to Second Normal Form (2NF)! Everything is dependent on it's own UID
We now need to make sure (if applicable) that these Entities/Tables are related. Not really a part of normalization but a practical consideration that needs to be made before continuing.
(I won't go into a Cardinality discussion here. If you are unsure what the lines mean, there is a pretty good conversation here.)
But this is not enough. Receipt still has redundant data. For example, we can calculate the Receipt Sub total and total. We can certainly calculate how many items are on the receipt. I've highlighted the calculated fields/columns/attributes in the RECEIPT.
By removing those, we get to Third Normal Form (3NF) almost.
What's going on here? What is the ItemTotal? It would be ItemPrice * ItemQuantity ON EACH RECEIPT. But these items don't really describe the receipt and they don't describe the item, either. They cause a tremendous disturbance in the force!!!! A New Entity must rise up and restore balance!
We actually got to watch the birth of a true INTERSECTION/BRIDGING/ASSOCIATIVE entity. RECEIPT_LINES associates the relationship between ITEM and RECEIPT. It Bridges the entities, serving as an Intersection between the two.
Sadly, ItemTotal is gone! It was derivable and had no place in a 3NF diagram. But Price continues to live in ITEM and ItemQuantity now has moved to RECEIPT_LINES. So ItemTotal can still be derived if needed.
I know this was a long and imperfect example of this process but I hope it helps in some small way shore up some of your data modeling skills!
A quick warning : 3NF is awesome in databases where there are a lot of updates! It is usually not prefered in the world of data warehousing, however. So don't go running into a data warehouse schema screaming that it should be in 3NF or your will reveal your "naivete!" Also, sometimes performance may dictate you must "denormalize." (For example, put the ITEM DESCRIPTION in every table that has an ITEM ID.) These are opportunities for discussion - don't go all "dogmatic" on this!
Have fun!
Further Discussion.
http://support.microsoft.com/kb/283878
Oracle SQL Modeler Data Modeler is here -
DDL script to serve as a starting point is here -
CREATE TABLE RECEIPT
(
StoreAddress VARCHAR2 (30) NOT NULL ,
StoreDirector VARCHAR2 (30) NOT NULL ,
Cashier VARCHAR2 (30) NOT NULL ,
RecieptStartDate DATE NOT NULL ,
Department VARCHAR2 (30) NOT NULL ,
ItemQuantity INTEGER NOT NULL ,
ItemPrice NUMBER (5,2) NOT NULL ,
ItemDescription VARCHAR2 (30) NOT NULL ,
ItemID INTEGER NOT NULL ,
ItemTotal NUMBER (5,2) NOT NULL ,
RecieptSubTotal NUMBER (6,2) NOT NULL ,
TaxRate NUMBER (3,2) NOT NULL ,
ReceiptTotal NUMBER (7,2) NOT NULL ,
CashTendered NUMBER (8,2) NOT NULL ,
ChangeGiven NUMBER (6,2) NOT NULL ,
NumberOfItems INTEGER NOT NULL ,
TransactionID INTEGER NOT NULL ,
Operator INTEGER NOT NULL ,
Terminal INTEGER NOT NULL ,
StoreID INTEGER NOT NULL ,
ReceiptEndDate DATE NOT NULL
) ;
(
StoreAddress VARCHAR2 (30) NOT NULL ,
StoreDirector VARCHAR2 (30) NOT NULL ,
Cashier VARCHAR2 (30) NOT NULL ,
RecieptStartDate DATE NOT NULL ,
Department VARCHAR2 (30) NOT NULL ,
ItemQuantity INTEGER NOT NULL ,
ItemPrice NUMBER (5,2) NOT NULL ,
ItemDescription VARCHAR2 (30) NOT NULL ,
ItemID INTEGER NOT NULL ,
ItemTotal NUMBER (5,2) NOT NULL ,
RecieptSubTotal NUMBER (6,2) NOT NULL ,
TaxRate NUMBER (3,2) NOT NULL ,
ReceiptTotal NUMBER (7,2) NOT NULL ,
CashTendered NUMBER (8,2) NOT NULL ,
ChangeGiven NUMBER (6,2) NOT NULL ,
NumberOfItems INTEGER NOT NULL ,
TransactionID INTEGER NOT NULL ,
Operator INTEGER NOT NULL ,
Terminal INTEGER NOT NULL ,
StoreID INTEGER NOT NULL ,
ReceiptEndDate DATE NOT NULL
) ;
No comments:
Post a Comment