Microsoft Power BI

by Dejan Pajk Dejan Pajk No Comments

Ten Techniques for Optimising Memory Usage in Microsoft Power BI

With Power BI you can connect to a multitude of data sources. You can either import data, which is the most common way, or you can connect directly to the original data source via DirectQuery or Live Connection. When you import data to Power BI, you take advantage of the high-performance in-memory query engine known as VertiPaq.

In this post, I will share with you a few simple techniques that you can apply to optimise the memory usage when importing your data to Power BI. A smaller model not only saves computer memory, it also delivers better performance due to faster scans.

A smaller model not only saves computer memory, it also delivers better performance due to faster scans.

If you would like to speed up data refresh time or improve the responsiveness of your Power BI reports and dashboards, consider applying the following techniques:

  1. Filter source data – Import only rows you need for your analysis, e.g. set date filter to import only transactions for the last two years and not the entire sales history.
  2. Remove columns – Remove all columns that are not relevant to your analysis, such as primary keys not used in relationships, columns that can be calculated from other columns without performance penalty, or description columns stored in a fact table.
  3. Lower granularity – Higher the granularity of a fact table, the more rows it will have. Our data sources usually store fine granular transactional data. Transaction-level grain is too detailed for most reporting scenarios. Lowering the granularity by grouping and pre-aggregating data is a great way to reduce the number of rows in your fact table. For example, consider grouping your data by month if you only need to slice by month and year.
  4. Decrease cardinality – i.e. the number of distinct values of a column. VertiPaq is a columnar database. It stores each column in a separate data structure and applies sophisticated compression and encoding to reduce the memory footprint. With dictionary encoding, which is commonly used, only distinct values of a column are stored in a database. The cardinality is therefore considered as the main factor that determines a column size. Reducing the cardinality is recommended for all columns stored in large tables, such as a fact table. Here are few actions you can apply to decrease the cardinality: round numbers to remove obsolete decimals; round time to remove milliseconds/seconds; separate text columns into two or more parts; split DateTime into date and time columns.
  5. Avoid calculated columns – Calculated columns come with hidden costs. They consume memory, they are not optimally compressed as native columns and they increase data refresh time. Rather than creating a calculated column after data has been loaded into the model, a native column can be added to your source database or within Query Editor. Another efficient way is to replace the calculated column with a measure, which will not consume any memory. You should always gauge performance improvements of a calculated column and related memory costs when deciding whether or not to use it.
  6. Disable load – When you import data from a source, you apply transformations, such as merging and appending queries. As a result, you may end up with queries that are only used as intermediate transformation steps. By default, all queries from Query Editor are loaded into the memory of Power BI model. It is crucial to disable load for all queries that are not required in the final model.
  7. Disable Auto Date/Time – Power BI automatically creates a built-in date table for each date field in the model to support time intelligence DAX functions. Those tables are hidden, they consume memory, and there is no flexibility to add custom columns, e.g. to support the fiscal year in Australia. Many analysts prefer to create their own date table, over which they have full control. Such tables should be marked as a Date Table (feature added in Power BI Desktop February update). To remove all hidden date tables from your model, in Power BI Desktop select File / Options and Settings / Options / Data Load and untick the Auto Date/Time.
  8. Choose appropriate data type – Make sure all columns have the correct data type. Note: Fixed Decimal Number data type is internally stored as an integer. This data type is recommended for all decimal numbers with the precision of four or fewer decimals.
  9. Transform data at the right place – Most data transformations generally take place in Query Editor in Power BI Desktop. Query Editor is a powerful and user-friendly tool. It will keep track of all applied transformation steps which is useful for traceability and future maintenance. However, you may obtain improved performance when you apply transformations directly at the source database. For example, grouping your sales data by month in your transactional database will increase the source query execution times and as a result, only grouped data will be sent over the network to Power BI. Another option is to import fine granular source data and perform grouping later in Query Editor. This will bring opposite effects – faster reads at the source, larger dataset to send over the network and more intensive processing in Power BI. There are many factors that will influence this decision, e.g. network bandwidth, source database speed. Notice that shaping your data in the source database can introduce performance issues for other database users and systems. It is essential to measure query execution times and data refresh times before making a decision.
  10. Consider using DirectQuery – You should import data to Power BI wherever possible, however, if your goals cannot be met by importing data, then consider using DirectQuery. DirectQuery is generally only feasible when the underlying data source can provide interactive queries and is able to handle the query load. Follow the official guidance to implement DirectQuery successfully.

Remember, memory is the biggest asset in Power BI.

Remember, memory is the biggest asset in Power BI. Techniques represented in this post will reduce memory footprint which has a direct impact on the performance of your reports and dashboards.

What are your thoughts on Power BI memory optimisation? What other practices have you come across in your work? Please share your comments below.

Link to the original article:

by Dejan Pajk Dejan Pajk No Comments

Using Star Schemas for Power BI Reporting

Trying to produce an analytical report directly from an operational system is a daunting experience. Operational data models are highly normalised and optimised for data entry, not for data retrieval. Therefore, analytical queries against an operational system can largely affect performance and can put a burden on your IT infrastructure.

In my previous post, I examined the importance of data modelling when building Power BI reports. This week, I’m introducing star schemas, widely accepted as a standard data model for BI systems.

A star schema comprises a central fact table, e.g. Sales; and multiple dimension tables, e.g. Date, Customer, Product Group. The fact table contains data from business transactions or a snapshot summary of that data. It is connected through a many-to-one relationship to each dimension table. The main advantages of star schemas are:

  • Simplicity – Star schemas are easy to read, understand and use. Dimensions are used to slice and dice the data, and facts to aggregate numbers. (Note: it is very important to use standard business names when naming tables and columns in your model).
  • Performance – Because star schemas have a small number of tables and clear join paths, queries run faster than they do against an operational system.
  • Scalability – Star schemas are extensible to accommodate changes, such as adding new dimensions, attributes or measures.
  • Support – Star schemas are the most commonly used in BI implementations and therefore supported by a large number of BI tools.

Star schema design is predicated on the assumption that our transactions (facts) are very voluminous, and dimensions, on the other hand, are smaller and relatively trivial compared to the facts. The fundamentals of star schemas are therefore quite simple. Firstly, we make the fact table, which is the largest one, narrow and highly normalised. Secondly, we make dimension tables wide and denormalised by taking out as many joins as possible.

BI professionals share the same opinion that star schemas are nearly always the best way to represent analytical data.

Fact or dimension tables should not have relationships between themselves. Facts can only be related to dimension tables. A snowflake is a variation of a star schema where some of the dimensions are normalised. Snowflake models are generally easier to maintain, and apart from a small performance degradation due to additional joins, there is nothing wrong with them. We can always convert snowflake back to a star schema by denormalising dimensions. Another variation is a star schema with multiple fact tables that are not related to each other. In this case, multiple fact tables share the same dimensions. Common examples are Budget vs. Actuals or Sales vs. Purchases reports.

BI professionals share the same opinion that star schemas are nearly always the best way to represent analytical data. When building data models with Power BI, it is worth spending additional time in Query Editor to transform your data sources into facts and dimensions.

Remember, if you are facing modelling problems, consider rebuilding your model to move toward a star schema. By taking this approach you will find yourself heading in the right direction.

Link to the original article:

by Dejan Pajk Dejan Pajk No Comments

Data Modelling for Power BI

Power BI was released in 2015. Since then, it has spread like a bush fire. It has become a critical tool for many businesses, taking data from anywhere and combining it to power real-time decisions. Since its release, customers have created over 11.5 million data models worldwide, with 30,000 more added daily.

Building data models with Power BI is easy and fast. You’re not required to flatten your data into one table. Once you’ve pulled in data from multiple sources, you can create a model by simply connecting tables using relationships. However, with great flexibility comes great responsibility. It is very unlikely that the source data is already perfectly structured for the kind of analysis you want to perform. If you only connect source tables with relationships, without following data modelling principles, too often this will result in complex data models that are expensive to maintain. If the numbers you’re calculating do not match, or if the DAX formulas are too complex to maintain, you will most likely have an issue in a data model. A data model is like the foundation for your house, get it right and the subsequent construction, and the house itself, will be better.

A data model is like the foundation for your house, get it right and the subsequent construction, and the house itself, will be better.

Data modelling is a basic skill that anybody interesting in building BI reports should master. Being a good data modeler means being able to match your specific model with one of the many different patterns that have already been studied and solved. A good introduction to this topic is the book: Analyzing Data with Power BI and Power Pivot for Excel, written by M. Russo and A. Ferrari. This book will get you started, however, to become a great data modeler you will need experience, which means you might have to suffer some failures as you learn.

The first important data modelling decision is granularity, i.e. the level of detail in your tables. You must have the right level of granularity to meet your reporting needs. If your granularity is too coarse you will not be able to extract the information you want. On the other hand, fine granularity means larger and slower datasets, and potentially complex DAX formulas. For example, if your Sales table contains data for the individual transactions but you only need to report by month, quarter and year, the correct level (or granularity) would be a month.

Another critical data modelling decision is the level of normalisation. The most common reporting data sources are operational systems, such as Dynamics NAV. An operational system directly supports the execution of a business process by capturing details about significant business events or transactions. Therefore, operational databases are highly normalised and optimised for insert and update operations. While the focus of the operational system is the execution of a business process, the BI system, such as Power BI, supports the evaluation of the process. Interaction with a BI system takes place exclusively through queries that only retrieve data about business processes. When you only read data, normalisation is rarely a good technique. The database performance is better when there are fewer joins required.

Dimensional modelling has been introduced as a direct response to the unique requirements of BI systems. A star schema is the simplest form of a dimensional model, in which data is organized into facts and dimensions.
In my next blog I will introduce star schemas and their benefits.

Link to the original article:

by Dejan Pajk Dejan Pajk No Comments

Introducing Power BI

Power BI is a Business Intelligence solution provided by Microsoft. It consists of a Windows desktop application called Power BI Desktop; online Software as a Service (SaaS) called the Power BI service; and mobile Power BI apps available on Windows, iOS and Android devices.

Power BI is an evolution of the add-ins previously available in Excel: Power Pivot, Power Query and Power View. You can use Power BI with or without Excel which means that you are no longer dependent on the version of Microsoft Office installed at your company.

Power BI provides connectors that make it possible to easily access a whole range of data sources, including on-premises databases, Excel workbooks, and over 59 cloud services. In addition, Power BI comes with Content Packs—pre-built reports and dashboards for the popular cloud services such as Google Analytics, Salesforce, MailChimp or Zendesk.

If you are a Dynamics NAV user, you can connect to the Power BI Content Pack for Dynamics NAV and start exploring your own data from an out-of-the-box dashboard showcasing important metrics on sales, monthly goals, revenues vs. expenses and more.

Power BI provides interactive visualizations with self-service business intelligence capabilities, where end users can create reports and dashboards themselves, instead of depending on technical staff.

With Power BI, you have the ability to carry out analysis of your data by asking questions, in plain English—no special code or syntax is required. This feature is called natural-language queries, and with it you can ask Power BI to perform tasks in much the same way you would ask one of your colleagues, for example: “Show sales 2015 by brand”.

Another feature worth mentioning is Quick Insights. Power BI can search a data set for interesting patterns and provide you with a list of charts to help you to better understand your data. The basic idea is that Power BI can use artificial intelligence to analyse your data, searching for some useful or interesting patterns. It uses very sophisticated algorithms whose speed depends on the size and complexity of the dataset.

You can create a Power BI Group Workspace to collaborate with your colleagues on dashboards, reports, and datasets. When the reports and dashboards are ready to be distributed, you can:

  • share them via Power BI Service or Power BI mobile
  • create an Organisational Content Pack (a collection of reports) and publish the reports
  • publish a dashboard to the web (if intended for public consumption)

You can sign up for Power BI through the Power BI website. You can use Power BI free, or you can buy a Power BI Pro license which includes additional data refresh and collaboration capabilities.

Link to the original article: