| By Mike Nicewarner | Article Rating: |
|
| November 1, 2004 12:00 AM EST | Reads: |
2,456 |
In this article, I discuss and provide examples of the many rules of data normalization. Currently I count at least seven, which are:
- Eliminate repeating groups: Make a separate table for each set of related attributes and give each table a primary key.
- Eliminate redundant data: If an attribute depends on only part of a multivalued key, remove it to a separate table.
- Eliminate columns not dependent on the key: If attributes do not contribute to a description of the key, remove them to a separate table.
- Isolate independent multiple relationships: No table may contain two or more 1:n or n:m relationships that are not directly related.
- Isolate semantically related multiple relationships: There may be practical constraints on information that justify separating logically related many-to-many relationships.
- Optimal normal form: A model limited to only simple (elemental) facts, as expressed in ORM.
- Domain-key normal form: A model free from all modification anomalies.
Eliminate Repeating Groups
In the original member list, each member name is followed by any databases that the member has experience with. Some might know many, and others might not know any. To answer the question, "Who knows DB2?" we need to perform an awkward scan of the list looking for references to DB2. This is inefficient and an extremely untidy way to store information.
Moving the known databases into a separate table helps a lot. Separating the repeating groups of databases from the member information results in the first normal form. The MemberID in the database table matches the primary key in the member table, providing a foreign key for relating the two tables with a join operation. Now we can answer the question by looking in the database table for "DB2" and getting the list of members (see Figure 1).
Eliminate Redundant Data
In the database table, the primary key is made up of the MemberID and the DatabaseID. This makes sense for other attributes like "Where Learned" and "Skill Level" attributes, since they will be different for every member/database combination. But the database name depends only on the DatabaseID. The same database name will appear redundantly every time its associated ID appears in the database table.
Suppose you want to reclassify a database - give it a different DatabaseID. The change has to be made for every member that lists that database. If you miss some, you'll have several members with the same database under different IDs. This is an update anomaly.
Or suppose the last member listing a particular database leaves the group. His records will be removed from the system, and the database will not be stored anywhere. This is a delete anomaly. To avoid these problems, we need a second normal form.
To achieve this, separate the attributes that depend on both parts of the key from those depending only on the DatabaseID. This results in two tables: "Database," which gives the name for each DatabaseID, and "MemberDatabase," which lists the databases for each member.
Now we can reclassify a database in a single operation: look up the DatabaseID in the "Database" table (see Figure 2) and change its name. The result will instantly be available throughout the application.
Eliminate Columns Not Dependent on the Key
The member table satisfies the first normal form; it contains no repeating groups. It satisfies the second normal form since it doesn't have a multivalued key. But the key is MemberID, and the company name and location describe only a company, not a member. To achieve a third normal form, they must be moved into a separate table. Since they describe a company, CompanyCode becomes the key of the new "Company" table (see Figure 3).
The motivation for this is the same for the second normal form: we want to avoid update and delete anomalies. For example, suppose no members from IBM were currently stored in the database. With the previous design, there would be no record of its existence, even though 20 past members were from IBM.
Isolate Independent Multiple Relationships
This applies primarily to key-only associative tables and appears as a ternary relationship, but has incorrectly merged two distinct, independent relationships.
The way this situation starts is by a business request list similar to Figure 4. This could be any two M:M relationships from a single entity. For instance, a member could know many software tools, and a software tool may be used by many members. Also, a member could have recommended many books, and a book could be recommended by many members.
To resolve the two M:M relationships, we know that we should resolve them separately, and that would give us the fourth normal form. But, if we were to combine them into a single table, it might look right (it is in the third normal form) at first. This is shown in Figure 5 and violates the fourth normal form.
To get a picture of what is wrong, look at some sample data, shown in Figure 6. The first few records look right, where Bill knows ERWin and recommends the ERWin Bible for everyone to read. But something is wrong with Mary and Steve. Mary didn't recommend a book, and Steve doesn't know any software tools. Our solution has forced us to do strange things like create dummy records in both Book and Software to allow the record in the association, since it is a key-only table.
The correct solution, to cause the model to be in the fourth normal form, is to ensure that all M:M relationships are resolved independently if they are indeed independent, as shown in Figure 7.
Note: This is not to say that all ternary associations are invalid. The above situation made it obvious that Books and Software were independently linked to Members. If, however, there were distinct links between all three, such that we would be stating that "Bill recommends the ERWin Bible as a reference for ERWin," then separating the relationship into two separate associations would be incorrect. In that case, we would lose the distinct information about the three-way relationship.
Isolate Semantically Related Multiple Relationships
Now let's modify the original business diagram and add a link between the books and the software tools, indicating which books deal with which software tools, as shown in Figure 8.
This makes sense after the discussion on Rule 4, and again we may be tempted to resolve the multiple M:M relationships into a single association, which would now violate the fifth normal form. The ternary association looks identical to the one shown in the fourth normal form example, and is also going to have trouble displaying the information correctly. This time we would have even more trouble because we can't show the relationships between books and software unless we have a member to link to, or we have to add our favorite dummy member record to allow the record in the association table (see Figure 9).
The solution, as before, is to ensure that all M:M relationships that are independent are resolved independently, resulting in the model shown in Figure 10. Now information about members and books, members and software, and books and software are all stored independently, even though they are all very much semantically related. It's very tempting in many situations to combine the multiple M:M relationships because they are so similar. Within complex business discussions, the lines can become blurred and the correct solution not so obvious.
Optimal Normal Form
At this point, we have done all we can with entity-relationship diagrams (ERD). Most people will stop here because this is usually pretty good. However, another modeling style called object role modeling (ORM) can display relationships that cannot be expressed in ERD. Therefore there are more normal forms beyond the fifth form.
With the optimal normal form (OMF) it is defined as a model limited to only simple (elemental) facts, as expressed in ORM.
Domain-Key Normal Form
This level of normalization is simply a model taken to the point where there are no opportunities for modification anomalies.
If every constraint on the relation is a logical consequence of the definition of keys and domains:
- A relation in DK/NF has no modification anomalies, and vice-versa.
- DK/NF is the ultimate normal form; there is no higher normal form related to modification anomalies.
- A relation is in DK/NF if every constraint on the relation is a logical consequence of the definition of keys and domains.
- Constraint is any rule governing static values of attributes that is precise enough to be ascertained whether or not it is true, e.g., edit rules, intrarelation and interrelation constraints, functional, and multivalued dependencies, not including constraints on changes in data values or time-dependent constraints.
- Key is the unique identifier of a tuple.
- Domain is a physical and a logical description of an attribute's allowed values.
- Physical description is the format of an attribute.
- Logical description is a further restriction of the values the domain is allowed.
- Logical consequence is to find a constraint on keys and/or domains that, if it is enforced, means the desired constraint is also enforced.
Practical consequence - since keys are enforced by the DBMS and domains are enforced by edit checks on data input, all modification anomalies can be avoided by just these two simple measures.
This article has been reprinted with permission from www.datamodel.org.
Published November 1, 2004 Reads 2,456
Copyright © 2004 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Mike Nicewarner
Mike Nicewarner, a database design manager working in Nebraska and president of the DataModel.Org organization, has been designing databases and modeling enterprise data with PowerDesigner since it was SDP and S-Designor.
- Why SOA Needs Cloud Computing - Part 1
- Cloud Expo and The End of Tech Recession
- The Transition to Cloud Computing: What Does It Mean For You?
- A Rules Engine Built in PowerBuilder
- Sybase Named “Silver Sponsor” of iPhone Developer Summit
- How PowerBuilder Got Its Groove Back
- The Cloud Has Cross-Border Ambitions
- Ulitzer Named "New Media" Partner of Greatly Anticipated iStrategy Event in Berlin
- Risks and Enterprise Mobility?
- Steps for Success in Enterprise Mobility?
- Are Mobile Luddites Resisting Mobility?
- Hot Event in Santa Clara Becomes Cool with the iPhone
- The Difference Between Web Hosting and Cloud Computing
- Sybase CTO to Speak at 4th International Cloud Computing Expo
- Why SOA Needs Cloud Computing - Part 1
- Cloud Expo and The End of Tech Recession
- The Transition to Cloud Computing: What Does It Mean For You?
- Five Reasons to Choose a Private Cloud
- Seeding The Cloud: The Future of Data Management
- The Threat Behind the Firewall
- Economy Drives Adoption of Virtual Lab Technology
- Tips for Efficient PaaS Application Design
- A Rules Engine Built in PowerBuilder
- Sybase Named “Silver Sponsor” of iPhone Developer Summit
- Where Are RIA Technologies Headed in 2008?
- PowerBuilder History - How Did It Evolve?
- The Top 250 Players in the Cloud Computing Ecosystem
- Custom Common Dialogs Using SetWindowsHookEx
- DDDW Tips and Tricks
- OLE - Extending the Capabilities of PowerBuilder
- DataWindow.NET How To: Data Entry Form
- Book Excerpt: Sybase Adaptive Server Anywhere
- Sybase ASE 12.5 Performance and Tuning
- Working with SOA & Web Services in PowerBuilder
- Office 2003 Toolbar: A New Look For Your Old PowerBuilder App
- Dynamically Creating DataWindow Objects
































