Understanding Dimensional Queries

Dimensional queries are designed to enable multiple-fact querying. The basic goals of multiple-fact querying are:

Preserve data when fact data does not perfectly align across common dimensions, such as when there are more rows in the facts than in the dimensions.
Prevent double-counting when fact data exists at different levels of granularity by ensuring that each fact is represented in a single query with appropriate grouping. Determinants may need to be created for the underlying query subjects in some cases.

Single Fact Query

A query on a star schema group results in a single fact query.

In this example, Sales is the focus of any query written. The dimensions provide attributes and descriptions to make the data in Sales more meaningful. All relationships between dimensions and the fact are 1-n.

When you filter on the month and product, the result is as follows.

Multiple-fact, Multiple-grain Query on Conformed Dimensions

A query on multiple facts and conformed dimensions respects the cardinality between each fact table and its dimensions and writes SQL to return all the rows from each fact table.

For example, Sales and Product Forecast are both facts.

Note that this is a simplified representation and not an example of how this would appear in a model built using IBM Cognos modeling recommendations.

The Result

Individual queries on Sales and Product Forecast by Month and Product yield the following results. The data in Sales is actually stored at the day level.

A query on Sales and Product Forecast respects the cardinality between each fact table and its dimensions and writes SQL to return all the rows from each fact table. The fact tables are matched on their common keys, month and product, and, where possible, are aggregated to the lowest common level of granularity. In this case, days are rolled up to months. Nulls are often returned for this type of query because a combination of dimensional elements in one fact table may not exist in the other.

Note that in February 2004, Course Pro Umbrellas were in the forecast but there were no actual sales. The data in Sales and Product Forecast exist at different levels of granularity. The data in Sales is at the day level, and Product Forecast is at the month level.

The SQL

The SQL generated by IBM Cognos 8, known as a stitched query, is often misunderstood. A stitched query uses multiple subqueries, one for each star, brought together by a full outer join on the common keys. The goal is to preserve all dimensional members occurring on either side of the query.

The following example was edited for length and is used as an example to capture the main features of stitched queries.

select
	coalesce(D2.MONTH_NAME,D3.MONTH_NAME) as MONTH_NAME,coalesce(D2.PRODUCT_NAME,D3.PRODUCT_NAME) as PRODUCT_NAME, 
	D2.EXPECTED_VOLUME as EXPECTED_VOLUME,
	D3.QUANTITY as QUANTITY
from (select TIME.MONTH_NAME as MONTH_NAME,
	PRODUCT_LOOKUP.PRODUCT_NAME as PRODUCT_NAME,
	XSUM(PRODUCT_FORECAST_FACT.EXPECTED_VOLUME for  
	TIME.CURRENT_YEAR,TIME.QUARTER_KEY,TIME.MONTH_KEY,
	PRODUCT.PRODUCT_LINE_CODE, PRODUCT.PRODUCT_TYPE_CODE,
	PRODUCT.PRODUCT_KEY) as EXPECTED_VOLUME
from 
	(select TIME.CURRENT_YEAR as CURRENT_YEAR,
	TIME.QUARTER_KEY as QUARTER_KEY,
	TIME.MONTH_KEY as MONTH_KEY,
	XMIN(TIME.MONTH_NAME for TIME.CURRENT_YEAR,
	TIME.QUARTER_KEY,TIME.MONTH_KEY) as MONTH_NAME
	from TIME_DIMENSION TIME
	group by TIME.MONTH_KEY) TIME
	join PRODUCT_FORECAST_FACT PRODUCT_FORECAST_FACT
	on (TIME.MONTH_KEY = PRODUCT_FORECAST_FACT.MONTH_KEY)
	join PRODUCT PRODUCT on (PRODUCT.PRODUCT_KEY =
	PRODUCT_FORECAST_FACT.PRODUCT_KEY)
where
	(PRODUCT.PRODUCT_NAME in ('Aloe Relief','Course Pro
Umbrella')) and(TIME.MONTH_NAME in ('April 2004','February 2004','February
2006')) 
group by 
	TIME.MONTH_NAME,
	PRODUCT_LOOKUP.PRODUCT_NAME
) D2
 full outer join 
(select TIME.MONTH_NAME as MONTH_NAME,
	PRODUCT_LOOKUP.PRODUCT_NAME as PRODUCT_NAME,
	XSUM(SALES_FACT.QUANTITY for TIME.CURRENT_YEAR,
	TIME.QUARTER_KEY, TIME.MONTH_KEY, 
	PRODUCT.PRODUCT_LINE_CODE, PRODUCT.PRODUCT_TYPE_CODE, 
	PRODUCT.PRODUCT_KEY ) as QUANTITY
from 
select TIME.DAY_KEY,TIME.MONTH_KEY,TIME.QUARTER_KEY,
	TIME.CURRENT_YEAR,TIME.MONTH_EN as MONTH_NAME 
	from TIME_DIMENSION TIME) TIME
	join SALES_FACT SALES_FACT
	on (TIME.DAY_KEY = SALES_FACT.ORDER_DAY_KEY)
	join PRODUCT PRODUCT on (PRODUCT.PRODUCT_KEY = SALES_FACT.PRODUCT_KEY)
where 
	PRODUCT.PRODUCT_NAME in ('Aloe Relief','Course Pro Umbrella'))
	and (TIME.MONTH_NAME in ('April 2004','February 2004','February
2006'))
 group byTIME.MONTH_NAME,PRODUCT.PRODUCT_NAME 
) D3
 on ((D2.MONTH_NAME = D3.MONTH_NAME) and(D2.PRODUCT_NAME = D3.PRODUCT_NAME))

When you combine these queries into a single query, the results are as follows.

The SQL

If you look at the SQL, you can see that, because IBM Cognos 8 detected that a circular join path exists in the model, it did not include one of the relationships that was not necessary to complete the join path. In this example, the relationship between Time and Product Forecast was dropped.

A circular join path rarely results in a query that produces useful results.

select 
	TIME_.MONTH_NAME as MONTH_NAME,
	PRODUCT_LOOKUP.PRODUCT_NAME as PRODUCT_NAME,
	XSUM(SALES_FACT.QUANTITY for
	TIME_.CURRENT_YEAR, TIME_.QUARTER_KEY, TIME_.MONTH_KEY,
	PRODUCT.PRODUCT_LINE_CODE, PRODUCT.PRODUCT_TYPE_CODE,
	PRODUCT.PRODUCT_KEY ) as QUANTITY,
	XSUM(PRODUCT_FORECAST_FACT.EXPECTED_VOLUME for TIME_.CURRENT_YEAR, 
	TIME_.QUARTER_KEY, TIME_.MONTH_KEY, PRODUCT.PRODUCT_LINE_CODE,
	PRODUCT.PRODUCT_TYPE_CODE, PRODUCT.PRODUCT_KEY ) as EXPECTED_VOLUME
from 
	(select TIME.DAY_KEY,TIME.MONTH_KEY, TIME.QUARTER_KEY,
	TIME.CURRENT_YEAR,TIME.MONTH_EN as MONTH_NAME 
	from TIME_DIMENSION TIME) TIME 
	joinSALES_FACT on (TIME_.DAY_KEY = SALES_FACT.ORDER_DAY_KEY)joinPRODUCT_FORECAST_FACT on (TIME_.MONTH_KEY =PRODUCT_FORECAST_FACT.MONTH_KEY)joinPRODUCT (PRODUCT.PRODUCT_KEY = PRODUCT_FORECAST_FACT.PRODUCT_KEY) 
where 
	(PRODUCT.PRODUCT_NAME in ('Aloe Relief','Course Pro Umbrella'))
and
	(TIME_.MONTH_NAME in ('April 2004','February 2004','February 2006'))
group by 
	TIME_.MONTH_NAME, PRODUCT.PRODUCT_NAME

Multiple-fact, Multiple-grain Query on Non-Conformed Dimensions

If a non-conformed dimension is added to the query, the nature of the result returned by the stitched query is changed. It is no longer possible to aggregate records to a lowest common level of granularity because one side of the query has dimensionality that is not common to the other side of the query. The result returned is really two correlated lists.

The Result

The results of individual queries on the respective star schemas look like this.

Querying the same items from both star schemas yields the following result.

In this result, the lower level of granularity for records from Sales results in more records being returned for each month and product combination. There is now a 1-n relationship between the rows returned from Product Forecast and those returned from Sales.

When you compare this to the result returned in the example of the multiple-fact, multiple grain query on conformed dimensions, you can see that more records are returned and that Expected Volume results are repeated across multiple Order Methods. Adding Order Method to the query effectively changes the relationship between Quantity data and Expected Volume data to a 1-n relationship. It is no longer possible to relate a single value from Expected Volume to one value from Quantity.

Grouping on the Month key demonstrates that the result in this example is based on the same data set as the result in the multiple-fact, multiple-grain query but with a greater degree of granularity.

The SQL

The stitched SQL generated for this example is very similar to the SQL generated in the multiple-fact, multiple-grain query . The main difference is the addition of Order Method. Order Method is not a conformed dimension and affects only the query against the Sales Fact table.

select 
	D2.QUANTITY as QUANTITY,
	D3.EXPECTED_VOLUME as EXPECTED_VOLUME, 
	coalesce(D2.PRODUCT_NAME,D3.PRODUCT_NAME) as PRODUCT_NAME,coalesce(D2.MONTH_NAME,D3.MONTH_NAME) as MONTH_NAME, 
	D2.ORDER_METHOD as ORDER_METHOD
from 
	(select 
	PRODUCT.PRODUCT_NAME as PRODUCT_NAME,
	TIME.MONTH_NAME as MONTH_NAME,
	ORDER_METHOD.ORDER_METHOD as ORDER_METHOD,
	XSUM(SALES_FACT.QUANTITY for TIME.CURRENT_YEAR,TIME.QUARTER_KEY,
	TIME.MONTH_KEY,PRODUCT.PRODUCT_LINE_CODE,PRODUCT.PRODUCT_TYPE_CODE,
	PRODUCT.PRODUCT_KEY,ORDER_METHOD_DIMENSION.ORDER_METHOD_KEY) as
QUANTITY
from 
	PRODUCT_DIMENSION PRODUCT
	join
	SALES_FACT SALES_FACT
	on (PRODUCT.PRODUCT_KEY = SALES_FACT.PRODUCT_KEY)
	join 
	ORDER_METHOD_DIMENSION ORDER_METHOD
	on (ORDER_METHOD.ORDER_METHOD_KEY = SALES_FACT.ORDER_METHOD_KEY)
	join TIME_DIMENSION TIME
	on ( TIME.DAY_KEY = SALES_FACT.ORDER_DAY_KEY)
where 
	(PRODUCT.PRODUCT_NAME in ('Aloe Relief','Course Pro Umbrella'))
and 
	( TIME.MONTH_NAME in ('April 2004','February 2004','February 2006'))
 group byPRODUCT.PRODUCT_NAME,TIME.MONTH_NAME,ORDER_METHOD.ORDER_METHOD) D2full outer join 
(select 
	PRODUCT.PRODUCT_NAME as PRODUCT_NAME,
	TIME.MONTH_NAME as MONTH_NAME,
	XSUM(PRODUCT_FORECAST_FACT.EXPECTED_VOLUME for TIME.CURRENT_YEAR,
	TIME.QUARTER_KEY,TIME.MONTH_KEY,PRODUCT.PRODUCT_LINE_CODE,
	PRODUCT.PRODUCT_TYPE_CODE,PRODUCT.PRODUCT_KEY) as EXPECTED_VOLUME
from 
	PRODUCT_DIMENSION PRODUCT
	join 
	PRODUCT_FORECAST_FACT PRODUCT_FORECAST_FACT
	on (PRODUCT.PRODUCT_KEY = PRODUCT_FORECAST_FACT.PRODUCT_KEY)
	join 
(select 
	TIME.CURRENT_YEAR as CURRENT_YEAR,
	TIME.QUARTER_KEY as QUARTER_KEY,
	TIME.MONTH_KEY as MONTH_KEY,
	XMIN(TIME.MONTH_NAME for TIME.CURRENT_YEAR, TIME.QUARTER_KEY,
	TIME.MONTH_KEY) as MONTH_NAME
from 
	TIME_DIMENSION TIME
group by 
	TIME.CURRENT_YEAR,
	TIME.QUARTER_KEY,
	TIME.MONTH_KEY
	) TIME
	on (TIME.MONTH_KEY = PRODUCT_FORECAST_FACT.MONTH_KEY)
where 
	(PRODUCT.PRODUCT_NAME in ('Aloe Relief','Course Pro Umbrella'))
and 
	(TIME.MONTH_NAME in ('April 2004','February 2004','February 2006')) 
 group by PRODUCT.PRODUCT_NAME,TIME.MONTH_NAME) D3on ((D2.PRODUCT_NAME = D3.PRODUCT_NAME) and(D2.MONTH_NAME = D3.MONTH_NAME))