Author: Dejan Pajk

by Dejan Pajk Dejan Pajk No Comments

Ten Techniques for Optimising Memory Usage in Microsoft Power BI

With Power BI you can connect to a multitude of data sources. You can either import data, which is the most common way, or you can connect directly to the original data source via DirectQuery or Live Connection. When you import data to Power BI, you take advantage of the high-performance in-memory query engine known as VertiPaq.

In this post, I will share with you a few simple techniques that you can apply to optimise the memory usage when importing your data to Power BI. A smaller model not only saves computer memory, it also delivers better performance due to faster scans.

A smaller model not only saves computer memory, it also delivers better performance due to faster scans.

If you would like to speed up data refresh time or improve the responsiveness of your Power BI reports and dashboards, consider applying the following techniques:

  1. Filter source data – Import only rows you need for your analysis, e.g. set date filter to import only transactions for the last two years and not the entire sales history.
  2. Remove columns – Remove all columns that are not relevant to your analysis, such as primary keys not used in relationships, columns that can be calculated from other columns without performance penalty, or description columns stored in a fact table.
  3. Lower granularity – Higher the granularity of a fact table, the more rows it will have. Our data sources usually store fine granular transactional data. Transaction-level grain is too detailed for most reporting scenarios. Lowering the granularity by grouping and pre-aggregating data is a great way to reduce the number of rows in your fact table. For example, consider grouping your data by month if you only need to slice by month and year.
  4. Decrease cardinality – i.e. the number of distinct values of a column. VertiPaq is a columnar database. It stores each column in a separate data structure and applies sophisticated compression and encoding to reduce the memory footprint. With dictionary encoding, which is commonly used, only distinct values of a column are stored in a database. The cardinality is therefore considered as the main factor that determines a column size. Reducing the cardinality is recommended for all columns stored in large tables, such as a fact table. Here are few actions you can apply to decrease the cardinality: round numbers to remove obsolete decimals; round time to remove milliseconds/seconds; separate text columns into two or more parts; split DateTime into date and time columns.
  5. Avoid calculated columns – Calculated columns come with hidden costs. They consume memory, they are not optimally compressed as native columns and they increase data refresh time. Rather than creating a calculated column after data has been loaded into the model, a native column can be added to your source database or within Query Editor. Another efficient way is to replace the calculated column with a measure, which will not consume any memory. You should always gauge performance improvements of a calculated column and related memory costs when deciding whether or not to use it.
  6. Disable load – When you import data from a source, you apply transformations, such as merging and appending queries. As a result, you may end up with queries that are only used as intermediate transformation steps. By default, all queries from Query Editor are loaded into the memory of Power BI model. It is crucial to disable load for all queries that are not required in the final model.
  7. Disable Auto Date/Time – Power BI automatically creates a built-in date table for each date field in the model to support time intelligence DAX functions. Those tables are hidden, they consume memory, and there is no flexibility to add custom columns, e.g. to support the fiscal year in Australia. Many analysts prefer to create their own date table, over which they have full control. Such tables should be marked as a Date Table (feature added in Power BI Desktop February update). To remove all hidden date tables from your model, in Power BI Desktop select File / Options and Settings / Options / Data Load and untick the Auto Date/Time.
  8. Choose appropriate data type – Make sure all columns have the correct data type. Note: Fixed Decimal Number data type is internally stored as an integer. This data type is recommended for all decimal numbers with the precision of four or fewer decimals.
  9. Transform data at the right place – Most data transformations generally take place in Query Editor in Power BI Desktop. Query Editor is a powerful and user-friendly tool. It will keep track of all applied transformation steps which is useful for traceability and future maintenance. However, you may obtain improved performance when you apply transformations directly at the source database. For example, grouping your sales data by month in your transactional database will increase the source query execution times and as a result, only grouped data will be sent over the network to Power BI. Another option is to import fine granular source data and perform grouping later in Query Editor. This will bring opposite effects – faster reads at the source, larger dataset to send over the network and more intensive processing in Power BI. There are many factors that will influence this decision, e.g. network bandwidth, source database speed. Notice that shaping your data in the source database can introduce performance issues for other database users and systems. It is essential to measure query execution times and data refresh times before making a decision.
  10. Consider using DirectQuery – You should import data to Power BI wherever possible, however, if your goals cannot be met by importing data, then consider using DirectQuery. DirectQuery is generally only feasible when the underlying data source can provide interactive queries and is able to handle the query load. Follow the official guidance to implement DirectQuery successfully.

Remember, memory is the biggest asset in Power BI.

Remember, memory is the biggest asset in Power BI. Techniques represented in this post will reduce memory footprint which has a direct impact on the performance of your reports and dashboards.

What are your thoughts on Power BI memory optimisation? What other practices have you come across in your work? Please share your comments below.

Link to the original article:

by Dejan Pajk Dejan Pajk No Comments

Using Star Schemas for Power BI Reporting

Trying to produce an analytical report directly from an operational system is a daunting experience. Operational data models are highly normalised and optimised for data entry, not for data retrieval. Therefore, analytical queries against an operational system can largely affect performance and can put a burden on your IT infrastructure.

In my previous post, I examined the importance of data modelling when building Power BI reports. This week, I’m introducing star schemas, widely accepted as a standard data model for BI systems.

A star schema comprises a central fact table, e.g. Sales; and multiple dimension tables, e.g. Date, Customer, Product Group. The fact table contains data from business transactions or a snapshot summary of that data. It is connected through a many-to-one relationship to each dimension table. The main advantages of star schemas are:

  • Simplicity – Star schemas are easy to read, understand and use. Dimensions are used to slice and dice the data, and facts to aggregate numbers. (Note: it is very important to use standard business names when naming tables and columns in your model).
  • Performance – Because star schemas have a small number of tables and clear join paths, queries run faster than they do against an operational system.
  • Scalability – Star schemas are extensible to accommodate changes, such as adding new dimensions, attributes or measures.
  • Support – Star schemas are the most commonly used in BI implementations and therefore supported by a large number of BI tools.

Star schema design is predicated on the assumption that our transactions (facts) are very voluminous, and dimensions, on the other hand, are smaller and relatively trivial compared to the facts. The fundamentals of star schemas are therefore quite simple. Firstly, we make the fact table, which is the largest one, narrow and highly normalised. Secondly, we make dimension tables wide and denormalised by taking out as many joins as possible.

BI professionals share the same opinion that star schemas are nearly always the best way to represent analytical data.

Fact or dimension tables should not have relationships between themselves. Facts can only be related to dimension tables. A snowflake is a variation of a star schema where some of the dimensions are normalised. Snowflake models are generally easier to maintain, and apart from a small performance degradation due to additional joins, there is nothing wrong with them. We can always convert snowflake back to a star schema by denormalising dimensions. Another variation is a star schema with multiple fact tables that are not related to each other. In this case, multiple fact tables share the same dimensions. Common examples are Budget vs. Actuals or Sales vs. Purchases reports.

BI professionals share the same opinion that star schemas are nearly always the best way to represent analytical data. When building data models with Power BI, it is worth spending additional time in Query Editor to transform your data sources into facts and dimensions.

Remember, if you are facing modelling problems, consider rebuilding your model to move toward a star schema. By taking this approach you will find yourself heading in the right direction.

Link to the original article:

by Dejan Pajk Dejan Pajk No Comments

Data Modelling for Power BI

Power BI was released in 2015. Since then, it has spread like a bush fire. It has become a critical tool for many businesses, taking data from anywhere and combining it to power real-time decisions. Since its release, customers have created over 11.5 million data models worldwide, with 30,000 more added daily.

Building data models with Power BI is easy and fast. You’re not required to flatten your data into one table. Once you’ve pulled in data from multiple sources, you can create a model by simply connecting tables using relationships. However, with great flexibility comes great responsibility. It is very unlikely that the source data is already perfectly structured for the kind of analysis you want to perform. If you only connect source tables with relationships, without following data modelling principles, too often this will result in complex data models that are expensive to maintain. If the numbers you’re calculating do not match, or if the DAX formulas are too complex to maintain, you will most likely have an issue in a data model. A data model is like the foundation for your house, get it right and the subsequent construction, and the house itself, will be better.

A data model is like the foundation for your house, get it right and the subsequent construction, and the house itself, will be better.

Data modelling is a basic skill that anybody interesting in building BI reports should master. Being a good data modeler means being able to match your specific model with one of the many different patterns that have already been studied and solved. A good introduction to this topic is the book: Analyzing Data with Power BI and Power Pivot for Excel, written by M. Russo and A. Ferrari. This book will get you started, however, to become a great data modeler you will need experience, which means you might have to suffer some failures as you learn.

The first important data modelling decision is granularity, i.e. the level of detail in your tables. You must have the right level of granularity to meet your reporting needs. If your granularity is too coarse you will not be able to extract the information you want. On the other hand, fine granularity means larger and slower datasets, and potentially complex DAX formulas. For example, if your Sales table contains data for the individual transactions but you only need to report by month, quarter and year, the correct level (or granularity) would be a month.

Another critical data modelling decision is the level of normalisation. The most common reporting data sources are operational systems, such as Dynamics NAV. An operational system directly supports the execution of a business process by capturing details about significant business events or transactions. Therefore, operational databases are highly normalised and optimised for insert and update operations. While the focus of the operational system is the execution of a business process, the BI system, such as Power BI, supports the evaluation of the process. Interaction with a BI system takes place exclusively through queries that only retrieve data about business processes. When you only read data, normalisation is rarely a good technique. The database performance is better when there are fewer joins required.

Dimensional modelling has been introduced as a direct response to the unique requirements of BI systems. A star schema is the simplest form of a dimensional model, in which data is organized into facts and dimensions.
In my next blog I will introduce star schemas and their benefits.

Link to the original article:

by Dejan Pajk Dejan Pajk No Comments

Automated Testing of Dynamics NAV Enhancements

In last week’s post I discussed the Dynamics NAV test automation suite. Running Microsoft standard automated tests ensures that your modifications and add-on solutions do not interfere with vanilla NAV. However, all non-standard/custom business logic needs to be tested as well.

Traditionally, testing takes place in a test environment using sample customer data. During manual testing the application consultants and key users use prebuilt test scripts to test data validation and new functionality. The goal of automation is to reduce the number of test cases to be run manually—not eliminate manual testing all together.

Creating an automated test is more time consuming and expensive than running it once manually. What to automate, when to automate, or even whether one really needs automation are decisions the testing or development team must make. A good candidate for test automation is a test case that is executed repeatedly, is business critical/high-risk, is difficult to perform manually or is time consuming. Automating unstable features or features that are undergoing change should be avoided. Automated vs manual testing can be decided by a basic cost-benefit analysis. If it takes 5 times as long to build, verify and document an automated test, then after the number of automated test runs exceed 5, testing from that point is essentially free.

Test automation should be seen as a long-term investment in the quality of your systems.

Test automation should be seen as a long-term investment in the quality of your systems. The good news is that we can build custom automated tests on top of the Microsoft test toolkit. There are over 60 standard libraries containing numerous generic and application-specific functions that can be reused when building custom tests.

The following best practices should be followed when designing and developing custom automated tests:

  • Tests should follow the same Given-When-Then style. This approach was developed by Dan North and Chris Matts as part of Behaviour-Driven Development (BDD).
  • Tests should not depend on the data in test database. Data should be created on the fly using random generator functions. Hardcoded values in tests should be avoided.
  • Tests should leave the system in the same state as when the test started. You can use the TransactionModel property in test functions and the TestIsolation property in test runner codeunits to control the transactional behaviour. A common scenario is that each test function starts a separate write transaction and all the changes are rolled back after each test codeunit.
  • Test code should test that code works under successful and failing conditions (positive and negative tests). To test failing conditions, you can use the ASSERTERROR keyword.
  • Test code should be kept separate from the code being tested. That way, you can release the tested code to a production environment without releasing the test code.
  • Tests should not require user intervention. Special handler functions should be created to handle all UI interactions.
  • Code Coverage should be monitored to track the extent to which the application code is covered by tests.

There are third-party GUI capture/replay tools available that track user interactions with the product and build a script from them. It would be great if we had such a tool integrated within standard NAV as this would enable users to record new test scripts without any development required.

Testing is an iterative process. Automated tests should be executed every time an enhancement is made in the application (regression testing). During a typical development it is unacceptable to have to wait hours to get results from running all tests. That’s why the Dynamics NAV Test Tool gives us an option to select automated tests only for modified or selected objects (based on test coverage maps). Running automated tests frequently will ensure that new development doesn’t break any existing functionality. Now, with a combination of Microsoft’s default testing suite, and tests developed specifically for your organisation, new implementations and enhancements can go-live with minimum bugs.

Link to the original article:

by Dejan Pajk Dejan Pajk No Comments

Dynamics NAV Test Automation Suite

The increased complexity of systems and short product release cycles makes the task of testing challenging. One of the key problems is that testing typically comes late in the project, and traditional testing is performed manually. Testing represents a major cost in every Dynamics NAV implementation project. Test automation has been proposed as one solution to reduce these costs. Test automation suites promise to increase the number of tests we run and the frequency at which we run them.

In this post I will introduce the Microsoft test toolset then next week we will consider automated testing for non-standard/custom business logic, i.e. the code that has been added to provide enhancements and modifications to standard NAV.

With the Dynamics NAV 2016 release, Microsoft has made their test toolset a standard part of the product. It’s an exciting addition to Test Codeunits that have been around since NAV 2009. Microsoft is putting significant effort into an automated testing framework. With every version release and monthly cumulative update, new tests are introduced. In the chart below you can see that the number of automated tests in the last five NAV cumulative updates increased by more than 400.

There are unquestionable benefits of utilising automated testing. In my 10-year history with NAV implementations I’ve noticed that testing is too often seen as a “nice-to-have” instead of a “must have” activity. Both sides, the application consultants and the key users, share the same responsibility to carefully test their solutions. Tight deadlines, budget restrictions and busy daily schedules too often limit testing efforts to only critical scenarios and the rest is left to be discovered when the system is already in production.

Introducing automated testing can change this significantly. Automated tests can be run quickly and repeatedly. For example, it takes less than 3 hours to run over 18,000 tests on my laptop. If one manual test only took 1 minute, it would take over 40 man-days to complete all the tests manually.

“Nothing is holding you back anymore from incorporating this tool in your product development, and even custom projects. It’s only up to you to decide to what extent you are going to do this.”  Luc van Vugt, Microsoft MVP and former Microsoft tester.

The Dynamics NAV test suite includes repeatability, reducing man-hours and improving efficiency. It is a critical feature for continuous delivery of quality software.

But running Microsoft standard automated tests only makes sure that custom modifications and add-on solutions do not interfere with standard NAV. In other words, it confirms that existing functionality is still working as it should. However, all non-standard/custom business logic needs to be tested as well. Next week I’ll discuss the benefits and options for automated testing of these enhancements and modifications.

Link to the original article: