Recent years have seen the emergence of self-service and data visualization vendors such as Tableau, Power BI and Qlik Sense that has forced enterprise BI players such as MicroStrategy to improvise their tools to compete with these vendors. I have been a solutions architect in the field of BI been using MicroStrategy for more than 10 years. From my perspective as an architect, I appreciate the power of the underlying SQL engine in MicroStrategy and do not foresee self-service vendors replacing MicroStrategy as an enterprise BI tool in the coming years. What thrills me most as a Solutions Architect in Microstrategy is implementing workaround solutions to meet complex reporting requirements. In other words, I am building additional logic to change the default behavior of the MicroStrategy SQL engine. One example that comes to mind is the effective use of Cartesian joins of dimension tables in certain advanced reporting scenarios. As per conventional best practices, it is not recommended to use Cartesian joins in MicroStrategy. However, there are certain business cases where the execution of Cartesian joins is required. In this blog, I have illustrated the use of Cartesian joins of dimension tables using two scenarios.
Scenario 1- Accurate analysis across date dimension without the need of fictitious/dummy data
In this scenario, I used a public data set of construction spending data across educational and religious sectors for the year 2016. I massaged this data set further by deleting construction spending data of religious sector for the month May 2016. Shown below is a simple bar chart visualization displaying this scenario.
As seen in this graph, since there is no data in religious sector for May 2016, the bars are more spread apart in comparison with the educational sector data. Moreover, reference line calculations such mean, median ignore the fact that there was no construction spending data in the religious sector for May 2016, thereby misleading the audience. This issue becomes evident if the bar chart is converted to a line graph.
Using this graph, we may wrongly conclude that construction spending was consistently high during the middle of the year and slows down during the start and end of year.
This issue can be resolved by entering dummy data in the fact table to ensure all months appear across any other dimension during time series analysis. Nevertheless, not all organizations provide flexibility to manually enter fictitious data in the data warehouse to satisfy certain reporting scenarios. In such cases we may have to implement workaround solutions within the reporting layer to handle these complex issues.
In order to resolve this issue using MicroStrategy, I enabled the VLDB property- ‘Cartesian Join Warning’ to ‘Execute’. I also created an additional dummy metric that serves the same purpose as introduction of dummy data within the data warehouse. The sql statement for this metric is shown below.
CREATE TEMPORARY TABLE ZZMD00 as select a12.month_key month_key, a11.sector_key sector_key, sum(a11.spend_amt) WJXBFS1 from f_construction_spending a11 join d_date a12 on (a11.calendar_date = a12.calendar_date) group by a12.month_key, a11.sector_key CREATE TEMPORARY TABLE ZZMD01 as select a11.month_key month_key, a12.sector_key sector_key, max(a11.month_key) WJXBFS1 from d_month a11 cross join d_sector a12 where a11.month_key between 201601 and 201612 group by a11.month_key, a12.sector_key select coalesce(pa11.month_key, pa12.month_key) month_key, max(a14.month_short_desc) month_short_desc, coalesce(pa11.sector_key, pa12.sector_key) sector_key, max(a13.category) category, max(pa11.WJXBFS1) WJXBFS1, max(pa12.WJXBFS1) WJXBFS2 from ZZMD00 pa11 full outer join ZZMD01 pa12 on (pa11.sector_key = pa12.sector_key and pa11.month_key = pa12.month_key) join d_sector a13 on (coalesce(pa11.sector_key, pa12.sector_key) = a13.sector_key) join d_month a14 on (coalesce(pa11.month_key, pa12.month_key) = a14.month_key) group by coalesce(pa11.month_key, pa12.month_key), coalesce(pa11.sector_key, pa12.sector_key) drop table ZZMD00 drop table ZZMD01
This highlighted sql pass is basically a Cartesian join between the dimensions- Month and Sector with an additional filter restricting data to 2016. This sql pass thereby ensures there is a row for all months of 2016 for all sectors irrespective of the actual construction spending data.
As seen in the graphs, the revised average line considers the month of May 2016, dropping the average from 311 to 285.
Scenario 2- Month End Reporting
For this scenario, let us consider a hypothetical example of ‘Sales Commissions data’. A sales man ‘TOM’ belongs to the ‘CENTRAL’ region. On Jan 15, 2017, he moved from ‘CENTRAL’ region to ‘SOUTH EAST’ region. The dimension data corresponding to ‘Tom’ is shown below.
|Sales Rep Key||Sales Rep ID||Sales Rep||Region||Effective Start Date||Effective End Date|
As seen above, Type 2 strategy has been used to maintain history for the ‘Sales Rep’ dimension.
The Commissions fact data for this sales rep is shown below
|Sales Rep Key||Month||Commission Amt|
A simple report to show distribution of commissions across regions has been shown below.
Let us make this scenario a bit more complicated. What if the requirement for the month-end report is to allocate all commissions for each month to the region where the sales rep was assigned at the end of the month? To satisfy this requirement, I had to implement a workaround solution in which execution of Cartesian Joins was enabled. In this solution, the month-end region had to be created as a metric. An alias table for the Type 2 dimension was also created. The sql is shown below.
CREATE TEMPORARY TABLE ZZMD00 as select a11.sls_rep_id sls_rep_id, a12.month_key month_key, sum(a11.comm_amt) WJXBFS1 from f_commissions a11 join d_date a12 on (a11.earnings_date = a12.calendar_date) where a12.month_key = 201701 group by a11.sls_rep_id, a12.month_key CREATE TEMPORARY TABLE ZZMD01 as select a11.sls_rep_id sls_rep_id, a12.month_key month_key, max(a11.sls_rep_nm) WJXBFS1, max(a11.sls_rep_region) WJXBFS2 from d_salesrep a11 cross join d_month a12 where (((a11.sls_rep_key) in (select r12.sls_rep_key from d_month r11 cross join d_salesrep r12 where r11.month_end_date between r12.effective_start_dttm and r12.effective_end_dttm group by r12.sls_rep_key)) and a12.month_key = 201701) group by a11.sls_rep_id, a12.month_key select coalesce(pa11.sls_rep_id, pa12.sls_rep_id) sls_rep_id, coalesce(pa11.month_key, pa12.month_key) month_key, a14.month_short_desc month_short_desc, pa11.WJXBFS1 WJXBFS1, pa12.WJXBFS1 WJXBFS2, pa12.WJXBFS2 WJXBFS3 from ZZMD00 pa11 full outer join ZZMD01 pa12 on (pa11.sls_rep_id = pa12.sls_rep_id and pa11.month_key = pa12.month_key) join d_month a14 on (coalesce(pa11.month_key, pa12.month_key) = a14.month_key) drop table ZZMD00 drop table ZZMD01
The highlighted sql pass is basically a Cartesian join between the dimensions- Month and Sales Rep identifying all sales rep versions that are active for the date- ‘201701’.
The revised report results is shown below.
As seen in both the scenarios, execution of Cartesian joins can prove to be very useful to meet complex reporting scenarios. What was used in both these scenarios was essentially a non-equi join with additional where clauses. It is recommended though that enabling execution of Cartesian joins be enabled only for certain complex reports. Enabling this setting at an overall project level can result in incorrect results and performance degradation especially when the Cartesian joins are implemented without any filters.