Welcome!

PowerBuilder Authors: Bruce Armstrong, Fuat Kircaali, Ian Thain, Chris Pollach, Arthur Hefti

Related Topics: PowerBuilder

PowerBuilder: Article

No More Headaches with the Cartesian Product

No More Headaches with the Cartesian Product

In developing various report systems, I found it necessary to recall the ad hoc query tool (based on a concept by Sean Rhody discussed in SYS-CON Publications' Secrets of the PowerBuilder Masters 5.0). For those of you who haven't read about it I'll take a minute to describe it briefly.

Cartesian product n (1958) : a set that is constructed from two given sets and comprises all pairs of elements such that the first element of the pair is from the first set and the second is from the second set (Merriam Webster's Collegiate Dictionary, Tenth Edition)

Overview of the Existing System
The ad hoc query tool can be described as a tab control that has five tab folders. These folders allow a user to create a report sequentially (including its definition and serialization for certain RDBMSs).

  • Folder 1: Allows users to select columns they would like to include in the report definition
  • Folder 2: Provides a way to construct an expression for the report
  • Folder 3: The WHERE clause's tab folder
  • Folder 4: Enables a user to sort columns that have already been selected for the report
  • Folder 5: Presents the current query in a rich text edit control
At this point the user can submit a query by clicking Execute in the menu.

Everything worked until I asked myself what would happen if my user wanted to get data from more than one table. Submitting this report to the database wouldn't do any good. Simply put, we'd get a Cartesian product.

Cartesian Product Dilemma
What is the Cartesian product all about? To make things easy, I'll give you a pretty straightforward example based on the MS SQL Server and Sybase's SQL Anywhere RDBMSs. Let's say you have two tables - employee and department tables from the Powersoft demo database, which are related through the Primary-Foreign keys - and you want to retrieve the data from those tables. Understanding how joins are processed would help you understand them and figure out why, when you incorrectly state a join, you get unexpected results.

Internally, MS SQL Server and other RDBMSs have a much more sophisticated procedure regarding joins than you might think. The initial stage of the join is the Cartesian product (it's done automatically by the RDBMS). The number of rows in a Cartesian product of two tables is equal to the number of rows in the first table times the number of rows in the second table. The Cartesian product of the employee and department tables is 375 (75 * 5).

Realizing all that, I decided to enhance the existing tool by adding a wizard that would automate the process of joining the table participants and free users from any worries regarding join operations.

Application Overview
I'll concentrate here on the WHERE clause tab folder. I actually considered two different general scenarios. In the first one the user wants to create an ad hoc report that would include, for simplicity's sake, four tables. These tables - Employee, Department, Customer and Sales_Order - don't have a Primary-Foreign key relationship. Based on the conception of the Cartesian product, MS SQL Server will create the relationship once we include the tables in our query. If you leave the WHERE clause off the join, the Cartesian product would contain 30,618,000 rows. Well, this result really makes no sense because it's just half the job. The correct result, however (achieved by including a WHERE clause stipulating which columns were to be matched and the basis on which to match them), would be just 47 rows. When a user has dragged a couple of columns from tables presented in the tree view in the "Columns" tab folder and then wants to provide some WHERE conditions in selecting the WHERE tab folder, creation of a Cartesian product is automatically prevented. If the user wants to run the report with no limitations, checking will be done as well.

The following section introduces a solution based on employing MS SQL Server and Sybase SQL Anywhere sp_fkeys, that is, stored procedures.

Implementation
I guess it's time to describe exactly what I did regarding the ad hoc query tool to make it more robust and effective. After selecting the third tab page, I had to implement processing that would be posted from the selectionchanged event of that page. To help you understand this better, I'll construct my explanation using the ue_no_cartesian_product() method as the shell. This method, which is called in the ue_onselectionchanging event, is given in Listing 1 (all listings can be found on the PBDJ Web site). The first thing we do here is to instantiate the DataStore, n_cst_dstore; we then move one step forward by calling the next method, of_get_fkeys().

Now that we have n_cst_dstore, a DataStore associated with a DataWindow object (instance of this class) that has a Primary Key holder table as the argument, we instantiate one more DataStore class, referenced by the lds_X variable.

These two DataStores have been used to cover various scenarios. The first one is the process to eliminate a Cartesian product, assuming that we're dealing with the Primary Key table holder; the second is if the table we're dealing with is the Foreign Key table holder. We now have two DataStores that cover these two possibilities.

Both DataStores' DataWindow objects are based on sp_fkeys (MS SQL Server's stored procedure), but as you can see in Figure 1, the first DataStore's DataWindow object uses the Primary Key table's name as the argument while the second one uses the Foreign Key table's name.

Assuming that we're dealing with a Primary Key table, we'll retrieve data using the first DataStore. If we have tables associated with the Primary table (the currently processed table) through the Foreign Key relationship, the retrieve should produce a value greater than zero. From here the approach I've taken is pretty straightforward: I simply populate the structure step by step with the following values:

  • Primary Key table name
  • Primary Key column(s) name(s)
  • Foreign Key table name
  • Foreign key column(s) name(s)
Our structure now contains all the tables from the current database that have a Primary Key-Foreign Key relationship with the currently processed table. Obviously, what would make sense now is to filter out this structure so that only the tables included in the report definition remain.

Once filtering is done, we have the number of joins that have been generated based on the assumption that the current table is used as the Primary Key holder.

Let's take one more step to populate the structure that we'll use as a source to build join statements for table participants. It's important to make sure that we don't insert a join statement that's already been inserted in the structure by the previous processing. The implementation code for the of_get_fkeys() method would look like Listing 2.

So far, we've gone through all the steps necessary to build a string, which will be used eventually as a source to insert join statements for the Primary Key table (see Figure 2).

Since there's no way to know whether we've dealt with the Primary Key owner table or the Foreign Key owner table, we have to cover the other scenario, which is the processing based on the Foreign Key table. What we're doing is retrieving the data from lds_X DataStore (remember, this DataStore is associated with the DataWindow object that has as an argument the Foreign Key table's name). By calling the of_get_fkeys() method, we're verifying that the current table is the Foreign Key holder.

If this table is neither the Primary nor the Foreign Key holder, we'll send a message to the user saying that it has no relationship with any of the tables included in the report definition. After that, the user is supposed to remove this table's column(s) from the report. The remainder of the processing is otherwise basically the same. The implementation code for the of_get_pkeys() method would look like Listing 3.

After all this is done, and all tables have been processed, the only method we need to mention here is the ue_no_cartesian_product, shown in Listing 4. In this event we trigger the ue_insert_join_stmt() event that actually presents the code to insert the necessary join statements in the WHERE clause. This is given in Listing 5. Figure 3 displays the result of clicking the tab folder, "WHERE clause." Conclusion
The surest sign that you comprehend the solution I've decribed here will be that you try and take my approach further. My solution describes the event when the columns being joined are compared for equality, that is, an equijoin. An educated user, however, armed with an awareness of the various bases on which columns can be matched, can be challenged by generating more sophisticated reports.

More Stories By Boris J. Lipsman

Boris J. Lipsman is a senior software engineer for Arris Systems in Berwyn, Pennsylvania. A certified PowerBuilder developer associate, Boris has worked with PowerBuilder since version 3.0. He has an MA in building construction from Kalinin Construction University in Russia. His new interests include the latest Web technologies using Perl, CGI scripting and Java for Linux.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.