|By Mike Nicewarner||
|February 1, 2004 12:00 AM EST||
Many modelers are confused by overlaps in the data modeling terminology from the various vendors, and are not sure how to compare modeling tools when trying to buy one. This article will attempt to shed some light on this topic, and hopefully answer some of the more frustrating questions.
With data modeling, there are three models to deal with. First is the conceptual model, geared toward the business as a way to describe data requirements at a high level. Next is the physical model used to describe the precise schema used by the database platform. Finally, there is the logical model, used to tie the conceptual and physical models together and communicate with other IT groups. Some experts call for subdividing the conceptual into subject area, high-level, and fully attributed to communicate with the various business areas.
Most modeling vendors support logical and physical modeling, and a few add support for conceptual modeling.
The foundation of conceptual modeling is the entity-relationship model proposed by Peter Chen in 1976. The main entity-relationship diagram (ERD) of this model displays entities (nouns, or things the business is interested in) as boxes and relationships (verbs linking entities together, such as the verb "places" in the statement "Customer places order") as diamonds (see Figure 1).
Very few modeling tools use a pure Chen-style diagram notation. The typical conceptual diagram uses boxes for entities and lines to show the relationships. Cardinality (the numeric and mandatory nature of the relationships) is displayed by modifying the ends of the line on the respective entity boxes. A common notation is the "crow's feet" notation shown in Figure 2.
This figure shows that a customer may place no orders or many orders, and that an order must be placed by a single customer. In addition to entities and relationships, conceptual models include data items (a single element of data) that can be associated with an entity as an attribute and used to further describe that entity. For instance, "An Order has a date when the order was first placed, known as an Order Date". This indicates that there is a data item called Order Date that has been assigned to the Order entity to track when the order was placed. Some data items can be assigned to many entities, such as an Audit Date, which might be an attribute for each major entity in the model.
One key idea of conceptual modeling is to maintain communication with the business in their terminology. If the business expert doesn't give you a system-generated primary identifier for the entity, but you know they will need one, do not add it to the conceptual model. It "muddies the waters" and may force design decisions prematurely. All entities, relationships, and attributes should be named exactly as stated by the business. If the business expert has trouble clearly stating what something is, then you can help, but do not force them to use arcane naming convention structures.
As much as we love to use highly regimented naming rules, it just makes the job of documenting the data requirements and getting business approval that much harder. Even trying to "teach" them how to read an ERD is sometimes a waste of time. Instead, use English sentences to describe the relationships ("Customers can place many Orders" rather than "The cardinality of the relationship between Customer and Order is Zero-to-Many"). Rather than asking, "Is the Customer Address mandatory or optional?", use something like, "Do we always know the Customer's Address up-front?" It may be useful to create a report that lists the relationships and important attributes for the business expert to look over and then check off each item he or she approves of. This is easier than trying to go through a complex diagram with boxes and lines all over the place.
Another conceptual modeling notation is the Object Role Model (ORM) developed by Terry Halpin and published in 1996. ORM closely matches the role description ideas in the previous paragraph. Although similar to Chen's diagrams, ORM has significant differences and a more verbose syntax (see Figure 3).
Because of the complexity of most companies, a high-level conceptual model may be needed, and that model should be at the highest level of abstraction and contain only the most important entities and relationships in the enterprise. This is sometimes called a Subject Area Model, and is used to organize the lower-level models into the larger picture of the business information "landscape." If a single model is created of the entire company, it's called the Enterprise Data Model.
The people who are interested in the conceptual model include business experts, business analysts, requirements analysts, and data analysts. After all the data requirements have been approved at a fairly detailed level, a logical model should be created.
The logical data model is where IT ideas are first introduced into the business data requirements. The conceptual model used business names for things that will need to be adjusted to fit into the established naming conventions and other standards. The attributes were created without any regard for organization, but the logical model columns need to be grouped and arranged. Indexes are not part of the business requirements, but may be created in the logical model if needed. Also, surrogate keys can be introduced to minimize the foreign key complexity for deeply nested tables. In the end, the logical model has a structure that is probably pretty close to what the physical tables will look like (see Figure 4).
It has primary, foreign, and alternate keys, along with indexes, but nothing specific to a target database platform, which might not even have been selected yet.
A logical model uses terms from the physical model, such as table, column, and index, but might still display inheritance or loosely defined data types from the conceptual. The people who are interested in the logical model include business analysts, system architects, developers, project managers, and data analysts. After the structure and naming conventions are agreed upon, the physical model should be created.
The physical model is in the domain of the database administrator (DBA). This is where database platform constraints are applied to the data requirements, such as table and column name length limits or referential integrity limitations. Stored procedures, triggers, tablespaces, and table partitions are created depending on the platform. Depending on the tool, an iterative cycle can be entered into where test databases are created to discover which way of organizing things works best, then the model is updated from the database. Some tools have facilities to estimate database sizes, generate test data, or interface with other more sophisticated DBA tools (see Figure 5).
The people who are interested in the physical model include developers, testers, DBAs, and data analysts. After the DBA is satisfied with the structure of the model, the actual database can be created.
The sections in Figure 5 described the initial pass through the model layers. At any point during development the business needs may change, either through rapid environmental changes or because the analysis was incomplete. At any rate, the "mutation" process should begin at the highest layer possible to verify where the change begins to impact the data requirements. Even after a project is completed, there may be changes to the understanding of certain data elements that need to be documented. The models are living documents that need attention and care to remain vital to the company.
When a new project to modify an existing system is started, the change must not be made only in the database, as some would recommend doing. The changes are most likely due to business environment, legal, or government pressures on the company, and this will almost have to involve changes to the data requirements, which were described in the conceptual model. Changes made there have to flow through the logical and into the physical model before being applied to the database.
The tools matrix in Figure 6 is an example of some of the vendors that claim to offer data modeling tools. Although they all claim to address your data modeling needs, many fall short by tightly combining the logical and physical models, limiting the conceptual model, omitting conceptual modeling entirely, or by not providing the user with necessary control over the database creation and modification process.
The various modeling layers are valuable when put together. First, the conceptual model is used to describe the data requirements and communicate with the business at a high or detailed level. Next, the logical model allows IT to apply standards and conventions without getting bogged down with database limitations. Finally, the physical model is used by the DBA to create a valid database schema without having to use a text editor. A good modeling tool helps enforce the standards when appropriate, reduces communication confusion, and creates consistent documentation as part of the development effort. A poor tool will frustrate the best modeling intentions, forcing lengthy workarounds and manual effort. Choose wisely.
- Where Are RIA Technologies Headed in 2008?
- PowerBuilder History - How Did It Evolve?
- Creation and Consumption of Web Services with PowerBuilder
- Cloud People: A Who's Who of Cloud Computing
- DDDW Tips and Tricks
- Cloud Expo 2011 East To Attract 10,000 Delegates and 200 Exhibitors
- Working with SOA & Web Services in PowerBuilder
- Dynamically Creating DataWindow Objects
- Cloud Expo, Inc. Announces Cloud Expo 2011 New York Venue
- OLE - Extending the Capabilities of PowerBuilder