Sunday, May 18, 2014

Performance Optimization Techniques for Many to Many Relationships in SSAS

The following is a summary for a white paper titled "Analysis Services Many-to-Many Dimensions: Query Performance Optimization Techniques" by Dan Hardan, Erik Veerman, and Carl Rabeler.
  • In SSAS, when you query the data measure group by the M2M dimension, SSAS resolves the query by first retrieving the requested records from the appropriate measure groups and dimension, and then performing an in-memory run-time join between these retrieved records by using the granularity attributes of each dimension that the measure groups have in common. As the data size of the records that are being joined increases, the performance of this run-time join in SSAS suffers. 
  • This best practices white paper explores the use of the following three optimization techniques for improving the query performance of M2M dimension queries as data sizes increase:
    • Defining aggregations in the data measure group and in the intermediate measure group.
    • Partitioning the intermediate measure group.
    • Reducing the size of the intermediate measure group by "matrix relationship optimization technique".
  • To design an aggregation in the data measure group to support a M2M query, you must include in the aggregation the granularity attribute of all dimensions that join with the intermediate measure group. However, do not include the attribute from the M2M dimension; the join on this attribute in the data measure group and in the intermediate measure group occurs at query time as part of the run-time join. When a fact dimension is used as the intermediate dimension to join a M2M dimension to the data measure group (which is a very common scenario), you include the granularity attribute from the fact dimension in the aggregation—which generally causes the aggregation to be a significant fraction of the size of the facts themselves. The benefit of an aggregation in the data measure group for M2M queries is directly related to the size of this aggregation compared to the size of the data measure group.
  • To design an aggregation in the intermediate measure group, you must include in the aggregation the granularity attribute of the dimensions in the intermediate measure group that relates the measure group to the data measure group along with the attribute in the dimension that you wish to aggregate. The benefit of this aggregation is directly related to the resulting size of the aggregation compared to the size of intermediate measure group.
  • Partitioning enables SSAS to retrieve data from only a portion of data in a measure group when the partitioning scheme matches the query pattern and to parallelize data retrieval when data must be retrieved from multiple partitions.
  • The dimension usage grid must have an association defined from the intermediate measure group to each dimension that is used for partitioning. This is important because in order for the partitions to be selected in the query, the dimension must have a relationship to the intermediate measure group.  Otherwise SSAS will not know that a query can be optimized by filtering out partitions defined in the intermediate measure group.
  • The previous implementation requisites assume that the intermediate fact table contains the dimension surrogate key that is used for the measure group partitioning.
  • Another way to improve M2M query performance is to use a technique called the matrix relationship optimization technique. This optimization technique increases M2M query performance by reducing the size of the run-time join. The run-time join is reduced by using a process of compression to eliminate unnecessary repetitiveness in the intermediate fact table. With this technique, common M2M dimension member combinations in the intermediate fact table are identified and replaced with a surrogate key (called a matrix key). This technique collapses the size of the intermediate fact table, which results in a linear increase in M2M query performance. The degree to which the size of the intermediate fact is reduced is known as the reduction percentage—the higher the percentage the better. For example, if an original M2M relationship was like the following example:

    The process of compression is as follows (a couple of queries in page 12 & 13 to accomplish this, or better check the Many-to-Many Matrix Compression in BIDS Helper):

    And the result should be like:

    In the cube, set the Visible property for the matrix dimension to False; this dimension won’t be used for analytics, it is used only to resolve the relationship.
    Add a new SSAS measure group to act as the intermediate measure group for resolving the M2M relationships (and remove the existing Sales Reasons measure group). This measure group is based on the new intermediate fact table. Since each measure group needs at least one measure, use the default Count measure that is created and set its Visible property to False.
  • Test result between the default M2M relation and the optimization techniques mentioned in this article tells that the Matrix Compression technique is the fastest, even in some cases it is 70% faster than normal M2M.

No comments:

Post a Comment