Reducing Build Times for Large PowerCubes

This section describes techniques you can use to reduce build times when working with cubes having more than 500,000 categories, and parent:child ratios that deviate significantly from the ideal.

Included are examples to illustrate the efficiency gains you can expect from optimization strategies such as hardware upgrades, system tuning, memory re-allocation, or model redesign.

The main test model used to confirm these strategies had the characteristics described in the following table.

Test Model Characteristic	Description or Value
Total number of categories	492,152
Number of non-measure dimensions	5
Number of measures	5: 1 regular, 2 calculated, and 2 after-rollup calculated
Source data format	ASCII (tilde-character delimited)
Number of source files	9: 6 structural and 3 transactional; multiprocessing enabled for 4 largest
Number of transaction input records	50 million
Size of all source files	2.28 GB
Enabled options	crosstab-caching and auto-partitioning (5 passes with 500,000-record size limit)

The following grid shows the model design, as seen on the Transformer (Windows) interface:

Transportation (7) (2)	Date (197) (2)	HR Code (491,376) (2)	Country of Origin (286) (2)	Country of Export (286) (2)
Transportation (5 categories)	Year (15 categories)	HR Code 1 (1471 categories)	Region (10 categories)	Region (10 categories)
	Month (180 categories)	HR Code 2 (19036 categories)	Country (274 categories)	Country (274 categories)
		HR Code 3 (75916 categories)
		HR Code 4 (123856 categories)
		HR Code 5 (271095 categories)

LogFileDirectory=c:\transformer\logs\
ModelSaveDirectory=c:\transformer\models\
DataSourceDirectory=c:\transformer\data\
CubeSaveDirectory=e:\transformer\cubes\
DataWorkDirectory=d:\temp\
ModelWorkDirectory=e:\temp\

If you do not specify processing directories, Transformer version 8.x defaults to the IBM Cognos 8 default directories.

You may also want to configure the build server to use striping (RAID level 0) rather than mirroring, assuming that you have a backup for use if the system crashes during a production build. The build time is reduced because data is distributed among nonredundant disk drives.

Recommendation - Allocate Sufficient Space for the Temporary Files

You need different amounts of space for the temporary files during each cube build phase:

During the Data Read phase, the source data is read into a temporary work file based on the model structure. Insufficient disk space and database connectivity can cause problems at this stage.
During the Metadata Update phase, the contents of the temporary work file are compared to the categories in the model, to see which ones go into the cube, and a copy of the file is created. The original work file is only deleted after processing is complete and eligible categories are inserted into the cube. Insufficient disk space and lack of memory can cause problems at this stage.
During the Data Update phase, before the data points in the temporary work file can be inserted into the partitioned cube, the data must be sorted and consolidated. This requires several passes through the temporary file. The most common issue during this phase is low system memory.

There is a formula you can use to estimate the amount of space to allocate for the temporary files. If {([(A*4)+(B*4)+(C*16)+8]*D)/1024/1024}*2 is greater than 1431, the space calculation is that number plus E. If the calculated result is less than or equal to 1431, the required space is that number (do not add E).

Taking the test model as an example, the calculated space requirement is 7047 MB, given the following input values:

A = the total number of dimensions
B = the number of dimension views associated with the PowerCube
C = the number of regular measures (calculated measures are not counted)
D = the number of input records for all transactional data sources
E = 1431; the WorkFileMaxSize setting divided by 1024 * 1024 (MB)

To this number, you must add sufficient space for the PowerCube and the model checkpoint files: 7 GB + 20% for the former, and a number equal to the model working space for the latter.

We recommend that you point the Sort directory to the same location as the Data Temporary File, with all other Transformer directory locations pointing to another drive. You should then use a system performance monitor, during the cube build, to check the amount of available disk space in the three cube build phases.

Recommendation - Optimize the Operating Environment

You can adjust various settings, whether operating in UNIX/Linux or Windows environments, to shorten your cube build times.

We recommend that you optimize your system by changing various default settings, as follows:

WriteCacheSize
On UNIX or Linux servers, the default value for this setting is 32768 (32 MB). Doubling the value to 65536 (64 MB) or tripling it to 98304 (96 MB) is recommended to optimize larger UNIX or Linux systems.
To modify the WriteCacheSize setting, open the ppds_cfg.xml.sample file located in the installation_location/configuration directory. After you change the WriteCacheSize setting, save the file in the same directory as ppds_cfg.xml.
Sort buffer size
The sort buffer size used for local processing is specified as a preference setting on Windows and UNIX/Linux:
- On Windows, set the Work file sort buffer size on the File tab of the Preferences dialog box.
- On UNIX/Linux, set the sort buffer size using the WorkFileSortSize command.
The default setting is 8000000; however, you can raise the amount of physical memory available for sorting data during the consolidation and auto-partitioning process.
TEMPFILEDIRS (Windows only)
You can change the location of the temporary files that are created whenever sorting is done. We recommend that you specify multiple directories, separating each with a semicolon.
MaxTransactionNum (Windows only)
This setting limits the number of records that can be processed in temporary files before a checkpoint is inserted and records are committed to the PowerCube. To reduce cube build time, try raising the setting from its default (500000) to 800000. Or, if you get error message TR0112 during a cube build, lower the setting so records are committed more frequently, thereby freeing up space.
You change this setting on the General tab of the Preferences property sheet.
Ulimit (UNIX/Linux only)
Typically, you specify a value for this setting, such as 67 MB for a 2 GB-capacity server, so that system resources get shared effectively among competing processes. However, when operating Transformer on HP-UX, we recommend that you set this environment variable to unlimited, to provide the cube build process with as much physical memory as possible. (For other UNIX platforms or Linux, consult your operating system documentation to learn how you can best tune your kernel settings to allocate sufficient memory for Transformer.)
Tip: Type the command ulimit -a to determine the currently assigned values for Transformer. The time, file, and memory settings should show a value of unlimited.

Recommendation - Redistribute Files

You can globally change various file location settings by means of a preference file, to place your files on servers with sufficient space to handle your production requirements.

You can also specify preference settings on the command line. These settings override or take precedence over all other settings, including environment settings defined in the cogtr.sh file, or the environment variables TMPDIR, TEMP, and TMP as defined by the operating system. If multiple variables are defined, Transformer uses the first one in the list.

The UNIX/Linux command line syntax for specifying global preferences is

cogtr -fpreferences.rc -mmodel.mdl

The Windows equivalent is

cogtr.exe -n -fc:\preferences.prf model.mdl

Recommendation - Optimize Gateway Settings for IBM Cognos Series 7 IQDs

To further shorten the data read phase for IBM Cognos Series 7 IQDs, you can change the database-specific settings found in the gateway .ini files included in the Transformer installation directory.

Search for a file name such as cogdm*.ini, where the asterisk represents a specific database version. The entries in each gateway .ini file are different, depending on the database type.

Note: For IBM Cognos 8 data sources, see the IBM Cognos 8 Architecture and Deployment Guide.

Example - Change Oracle Database Settings

Oracle uses the cogdmor.ini gateway file for database-specific settings. We recommend you consider adjusting the following settings:

Fetch Number of Rows
Increasing the number of rows to fetch in each fetch operation can improve performance on some systems. Although the current limit for this number is 32767, numbers larger than the default (100) may degrade performance on some systems.
Fetch Buffer Size
Increasing the size of buffer used during fetch operations from the default (2048 bytes) can improve performance on some systems.

Where both entries have been changed, the row setting takes precedence over the buffer size setting.

Recommendation - Keep Model and Cube Sizes Within Practical Limits

There are practical limitations on the file size of production models and cubes, based on typical memory capacities and run-time performance requirements, in the production environment.

For example, we recommend that you limit the size of your models to 2 million categories each. Pay particular attention to bloating due to excessive use of labels, descriptions, short names, category codes, and custom views. Metadata associated with your structural data source dimensions, levels, and categories can contribute significantly to overall storage requirements.

In virtual memory terms, models held in allocated memory are limited by the available address space. Also, performance is severely impacted if a model is larger than the available physical memory.

Recommendation - Analyze Processing Bottlenecks Using the Log File

The Transformer log file provides useful information to help you diagnose the cause of processing bottlenecks during cube builds.

We recommend that you complete the following procedure:

Import these delimited-field data log files into a spreadsheet, such as Microsoft Excel:
LogFileDirectory=f: \logfiles\
In the resulting log file import, select the header of the E column and, if you are using Microsoft Excel, from the Data menu, click Filter and select the AutoFilter option.
In the list, select either (NonBlanks) or one of the three phases in the process. The spreadsheet now shows only the timing information for your selection.
Select several cells in the F column and select the Sum command, to add the timing values for your selected cells, as displayed in the lower toolbar.
Repeat the above steps to examine each subphase. These appear in distinct blocks as follows, with each completing before the next subphase begins:
Note: If the timing shown for TOTAL TIME (CREATE CUBE) is different from the timing for the separate phases, you can adjust for the time difference in the Cube Update phase.
Assess how much time is spent on each phase. If you notice that, over time, cube builds are taking longer to complete even though the data set is relatively constant, compare each phase in the log files, to see where increases in build times are occurring.