PowerBuilder Authors: Chris Pollach, Yeshim Deniz, Jayaram Krishnaswamy, Kevin Benedict, Avi Rosenthal

Related Topics: PowerBuilder

PowerBuilder: Article

Achieving Higher Throughput with SQL Anywhere

Focus on the performance requirement

Six years ago I was asked to implement a data repository to hold network management and performance data. With monitoring systems, databases have a tendency to be large, and have a delicate balance between inserting (logging) new data and deleting old (aging) data out from the system. The SQL Anywhere server database from iAnywhere was what the customer wanted to use, so I first had to evaluate if it could handle the expected load.

It was initially estimated that the solution would need to support 20 to 30 gigabyte databases, where data collection was ongoing and older data was deleted on a rolling window of 30 days. No problem - one gigabyte of data a day in, one gigabyte of data a day out. I easily validated that SQL Anywhere could handle the load. Over the next year, performance requirements doubled, then redoubled again. Currently, I now support many 100-plus gigabyte databases, and have even exceeded 170 gigabytes in a single database server using SQL Anywhere 7.0.4 and 8.0.2. For the purpose of this article, I've been running all tests on version 9.0.2 of SQL Anywhere.

In high throughput systems, the database schema will consist of many types of tables, ranging from configuration data, user data, and tables storing the high throughput data. These tables are usually tracking rapidly changing information over time, such as statistics, usage measurements, and changing conditions. Temperature changes from a few thousand sensors, fluid flow information on a gas pipeline, or network traffic statistics come to mind. This article concentrates on data throughput for inserts and deletes, and how to get the most out of your database server.

While there are many different variables that impact throughput performance, this paper focuses on database schema design and tuning the database server with respect to these high throughput tables. This article does not address performance issues related to application architecture or hardware selection.

Database Design - Focus on Performance
When high throughput is required, every decision needs to be thought of with respect to how it will affect performance. Databases are generally very fast at inserting and deleting data; however, as soon as you move away from this core functionality and start using all the great "features" of relational dtabases, you can easily hamper performance.

In other words, keep these high-performance tables simple. Some good guidelines include:

  • Keep table width to a minimum: If you want to be fast, you can't have everything. This will allow you to fit more records onto a database page and use fewer pages overall. Fewer pages means faster throughput.
  • Use surrogate primary keys with autoincrement: This allows for faster indexing, and in cases where there is a rolling window of time-stamped data, your deletes can be keybased rather than time based, or worse yet multi-field based.
  • Triggers should be avoided at all cost: Calling a stored procedure when inserting one record every minute or so is not very expensive. We're shooting for inserting hundreds of rows or more per second.
  • Avoid constraints that can be moved into middleware: These include not null constraints, default values, and boundary constraints.
In systems where you are constantly collecting data, the more you can limit the access of the information, the higher the rate of performance. The savings here are from using a limited set of indexes and foreign keys and educating developers on what data is available and how to access it. If you need only two ways of accessing the dataset and you are supporting seven indexes on a table just in case someone accesses it, you are unnecessarily hurting throughput. The removal of Foreign Key constraints is dangerous, but if you are including the constraint for "completeness" and never actually using the constraint, then it may be a candidate for deletion.

Application developers and the database designer responsible for maintaining high performance are usually at odds when it comes to indexing. The number and size of indexes directly affect the insertion rate of a table. When the need for massive throughput exists, the bare minimum of indexes should be used. I have gone as far as to restrict areas of application design in order to maintain a minimum set of indexes. This is a tricky trade-off since adding an application feature that represents 1% of the application could affect 50% of the application when it comes to performance. In the worst case it can make the entire application unusable.

A Throughput Example
The example schema below is valid, but contains several of the constructs that can hinder throughput, such as no autoincrement key, column constraints, extra indexes, and extra fields. I will use this to run performance tests of 100,000 records and make corrections on each test run. Each test changes only one aspect of the given design, and the final test applies all the changes. Tests were run an IBM desktop running Windows 2000 with a single Pentium 4 processor and 1.5 Gigabytes of RAM. The disk is a Seagate Barracuda 7200rpm IDE.

Results are given in elapsed time for each test. The key is the difference between each test and the baseline test, and not the elapsed time. As with anything, the actual elapsed time can be impacted by a wide range of variables, such as hardware, software, application design and, of course, what we're most interested in, schema design. See Listing 1.

SQL Anywhere Options
There are a few options that can be set in the server to further improve throughput, however, you need to consider the importance of your data. Since high throughput systems tend to have fairly benign data, you can be a little more cavalier in commit/rollback settings.

Setting the options Delayed_Commits to "On" and Cooperative_Commit to "Off" allows you to streamline throughput to the database. By changing these settings, you are allowing the application to continue processing instead of waiting to find out that the data absolutely got committed. However, you could lose data in the event of a system or hardware failure. Since high throughput data is often expendable, and any catastrophic event like a system or hardware failure typically means I'm going to lose data anyway, I typically change these for the database connections responsible for high-performance tables.

Rerunning Test 7 with Delayed_Commits to "On" and Cooperative_Commit to "Off," the elapsed time drops from 95.5 seconds to 60 seconds, representing an additional 59% gain.

Some Physical Considerations
Most performance problems usually lie in the application design, SQL, and database design. However, the physical choices you make can have a dramatic affect on performance. This area is deserving of a white paper, but a few basic tips include:

  • A fancy $100,000 server with slow disk throughput can be beaten in performance tests by a $500 system with a $100 IDE disk drive. A big Sun workstation ships with internal disks that typically run at 40Mb per second throughput. For a few hundred dollars, I can easily get a SCSI setup running at 160Mb or 320Mb per second on a cheap personal computer.
  • Most disks are physically formatted at 4096 block sizes - so use a 4096 database page size. Running Test 7 on a database using a 1024 page size, the elapsed time increased from 95.5 seconds to 100 seconds, which is a 5% decrease in performance.
  • The faster the disks RPM rate, the faster you can write data.
  • Only use hardware RAID controllers. Software RAID performs horribly.
  • In my experience, SQL Anywhere on Windows runs faster than Linux, which runs faster than Solaris.
  • Memory is cheap, so buy a lot. This will not directly help insert and delete rates, but when you start querying, especially with "order by" or "group by" clauses, you can easily force the database server to use Temp space. Temp space is a file on the disk and we want all disk usage to be for inserting and deleting and not to assist selects.
Backup, Transaction Logs, and Recovery
Recovery can become a major issue with large databases when not done properly. A power outage causing a restart of the server, for example, will automatically cause a SQL Anywhere server to go into an automatic recovery. In certain instances, I've seen the recovery process go on for several days in 7.0.4 and 8.0.2. The database can become inaccessible. Some users have gone as far as to find the dbsrv process and kill it manually. Let's think about this. The database was killed by accident, which is causing it to repair itself, and their solution was to kill it again, just to make sure that it stands even less chance of automatically recovering? Doesn't make a whole lot of sense.

More Stories By Todd Loomis

Todd Loomis is an independent consultant who specializes in database design and development frameworks. He has designed systems in many areas to include the aerospace, legal, financial, energy, and computer networking industries. Todd has been developing and supporting products using SQL Anywhere as an embedded database for 6 years. A performance-testing tool is available for free at his Website.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.