To use IBM Cognos product documentation, you must enable JavaScript in your browser.

Reducing Build Times for Large PowerCubes

This section describes techniques you can use to reduce build times when working with cubes having more than 500,000 categories, and parent:child ratios that deviate significantly from the ideal.

Included are examples to illustrate the efficiency gains you can expect from optimization strategies such as hardware upgrades, system tuning, memory re-allocation, or model redesign.

The main test model used to confirm these strategies had the characteristics described in the following table.

Test Model CharacteristicDescription or Value
Total number of categories492,152
Number of non-measure dimensions5
Number of measures5: 1 regular, 2 calculated, and 2 after-rollup calculated
Source data formatASCII (tilde-character delimited)
Number of source files9: 6 structural and 3 transactional; multiprocessing enabled for 4 largest
Number of transaction input records50 million
Size of all source files2.28 GB
Enabled optionscrosstab-caching and auto-partitioning (5 passes with 500,000-record size limit)

The following grid shows the model design, as seen on the Transformer (Windows) interface:

Transportation (7) (2)Date (197) (2)HR Code (491,376) (2)Country of Origin (286) (2)Country of Export (286) (2)
Transportation (5 categories)Year (15 categories)HR Code 1 (1471 categories)Region (10 categories)Region (10 categories)
 Month (180 categories)HR Code 2 (19036 categories)Country (274 categories)Country (274 categories)
  HR Code 3 (75916 categories)  
  HR Code 4 (123856 categories)  
  HR Code 5 (271095 categories)  

Recommendation - Multiprocessing with Transformer Server

By upgrading to a multi-CPU server and running multiple instances of Transformer in parallel, you can significantly decrease the build time for your largest cubes. In a model that generates more than one cube, the overall creation time is reduced to the time taken for the slowest cube to build.

We recommendthe following strategies:

Recommendation - Adjust the Memory Allocation

It is important to provide sufficient physical memory, on a server dedicated to building cubes, to avoid excessive pagination. In addition to having sufficient memory to handle all running application requests, make sure the operating system disk cache on the server can grow as large as required during the cube build process.

Once you have optimized the memory settings for your server, we recommend that you track virtual and working memory use, to find any bottlenecks created during the processing of categories in your largest slowest cubes.

Typically, total addressable (virtual) memory usage climbs rapidly during category generation, and remains relatively constant during all cube build phases (read data, update metadata, and update cube). Working memory, or that portion of the physical memory that is used by Transformer, also rises quickly and remains high until the cube update phase, when it is freed for use by the operating system disk cache.

In the first phase, the more categories there are, the more memory is required. Although models differ, working memory usage is typically 500-1,500 bytes per category. Paging (swap file use) occurs if the space allocated to working memory is insufficient and cannot grow to the amount required to process the model.

Recommendation - Reconfigure the Hard Drive

You can achieve additional gains by reconfiguring the hard drive of your build server to optimize I/O processing.

We recommend that you allocate at least three physical disk drives to your Transformer system, subdividing the processing as follows:

The log file for this configuration reads as follows, where 1, 2, and 3 represent drives c, d, and e:

LogFileDirectory=c:\transformer\logs\
ModelSaveDirectory=c:\transformer\models\
DataSourceDirectory=c:\transformer\data\
CubeSaveDirectory=e:\transformer\cubes\
DataWorkDirectory=d:\temp\
ModelWorkDirectory=e:\temp\

If you do not specify processing directories, Transformer version 8.x defaults to the IBM Cognos 8 default directories.

You may also want to configure the build server to use striping (RAID level 0) rather than mirroring, assuming that you have a backup for use if the system crashes during a production build. The build time is reduced because data is distributed among nonredundant disk drives.

Recommendation - Allocate Sufficient Space for the Temporary Files

You need different amounts of space for the temporary files during each cube build phase:

  1. During the Data Read phase, the source data is read into a temporary work file based on the model structure. Insufficient disk space and database connectivity can cause problems at this stage.

  2. During the Metadata Update phase, the contents of the temporary work file are compared to the categories in the model, to see which ones go into the cube, and a copy of the file is created. The original work file is only deleted after processing is complete and eligible categories are inserted into the cube. Insufficient disk space and lack of memory can cause problems at this stage.

  3. During the Data Update phase, before the data points in the temporary work file can be inserted into the partitioned cube, the data must be sorted and consolidated. This requires several passes through the temporary file. The most common issue during this phase is low system memory.

There is a formula you can use to estimate the amount of space to allocate for the temporary files. If {([(A*4)+(B*4)+(C*16)+8]*D)/1024/1024}*2 is greater than 1431, the space calculation is that number plus E. If the calculated result is less than or equal to 1431, the required space is that number (do not add E).

Taking the test model as an example, the calculated space requirement is 7047 MB, given the following input values:

To this number, you must add sufficient space for the PowerCube and the model checkpoint files: 7 GB + 20% for the former, and a number equal to the model working space for the latter.

We recommend that you point the Sort directory to the same location as the Data Temporary File, with all other Transformer directory locations pointing to another drive. You should then use a system performance monitor, during the cube build, to check the amount of available disk space in the three cube build phases.

Recommendation - Optimize the Operating Environment

You can adjust various settings, whether operating in UNIX/Linux or Windows environments, to shorten your cube build times.

We recommend that you optimize your system by changing various default settings, as follows:

Recommendation - Redistribute Files

You can globally change various file location settings by means of a preference file, to place your files on servers with sufficient space to handle your production requirements.

You can also specify preference settings on the command line. These settings override or take precedence over all other settings, including environment settings defined in the cogtr.sh file, or the environment variables TMPDIR, TEMP, and TMP as defined by the operating system. If multiple variables are defined, Transformer uses the first one in the list.

The UNIX/Linux command line syntax for specifying global preferences is

cogtr -fpreferences.rc -mmodel.mdl

The Windows equivalent is

cogtr.exe -n -fc:\preferences.prf model.mdl

Recommendation - Optimize Gateway Settings for IBM Cognos Series 7 IQDs

To further shorten the data read phase for IBM Cognos Series 7 IQDs, you can change the database-specific settings found in the gateway .ini files included in the Transformer installation directory.

Search for a file name such as cogdm*.ini, where the asterisk represents a specific database version. The entries in each gateway .ini file are different, depending on the database type.

Note: For IBM Cognos 8 data sources, see the IBM Cognos 8 Architecture and Deployment Guide.

Example - Change Oracle Database Settings

Oracle uses the cogdmor.ini gateway file for database-specific settings. We recommend you consider adjusting the following settings:

Where both entries have been changed, the row setting takes precedence over the buffer size setting.

Recommendation - Keep Model and Cube Sizes Within Practical Limits

There are practical limitations on the file size of production models and cubes, based on typical memory capacities and run-time performance requirements, in the production environment.

For example, we recommend that you limit the size of your models to 2 million categories each. Pay particular attention to bloating due to excessive use of labels, descriptions, short names, category codes, and custom views. Metadata associated with your structural data source dimensions, levels, and categories can contribute significantly to overall storage requirements.

In virtual memory terms, models held in allocated memory are limited by the available address space. Also, performance is severely impacted if a model is larger than the available physical memory.

Recommendation - Analyze Processing Bottlenecks Using the Log File

The Transformer log file provides useful information to help you diagnose the cause of processing bottlenecks during cube builds.

We recommend that you complete the following procedure:

  1. Import these delimited-field data log files into a spreadsheet, such as Microsoft Excel:

    LogFileDirectory=f: \logfiles\

  2. In the resulting log file import, select the header of the E column and, if you are using Microsoft Excel, from the Data menu, click Filter and select the AutoFilter option.

  3. In the list, select either (NonBlanks) or one of the three phases in the process. The spreadsheet now shows only the timing information for your selection.

  4. Select several cells in the F column and select the Sum command, to add the timing values for your selected cells, as displayed in the lower toolbar.

  5. Repeat the above steps to examine each subphase. These appear in distinct blocks as follows, with each completing before the next subphase begins:

    Note: If the timing shown for TOTAL TIME (CREATE CUBE) is different from the timing for the separate phases, you can adjust for the time difference in the Cube Update phase.

  6. Assess how much time is spent on each phase. If you notice that, over time, cube builds are taking longer to complete even though the data set is relatively constant, compare each phase in the log files, to see where increases in build times are occurring.