Introduction
Before start of developing Power BI reports, the important step is choosing the right architecture design from the multiple available models that suits best for your organization and allows you to deliver reports that can work with high performance and less maintenance for long time . An architecture design mainly talks about how Power BI will access to data sources and how the data will be prepared and combined to be used in the reports.
There are many factors involving on selecting the architecture design such as:
- Size of datasets or tables.
- Type of data sources e.g. relational databases, text files, web pages.
- How live the data need to be presented in the reports.
- How data need to be shared between multiple developers.
- And of course the costs for Microsoft services.
In this blog series I am going to talk about each of the available Power BI development models and discuss the pros and cons of them.
Import Mode
Import mode is the most common way of creating Power BI reports where data is pulled from the data sources and is pushed into a semantic model in a Power BI file or a Power BI data set in the cloud.
Initially during the report development in Power BI Desktop the imported data will be stored in the pbix file and then will be copied to Power BI Service in cloud by publishing the file.
Every time data is refreshed in Power BI Service the whole data will be transferred to the cloud via direct connection or Power BI Gateway.
Advantages
- Very good report performance as data is available in Power BI.
- Multiple dataset can be combined.
- All the available data sources are supported.
- There is no limitation in using DAX functionalities.
- Supports Q&A and Quick Insight.
- Supports all Power Query transformations.
Limitations
- The maximum dataset size is 1 GB (for Pro accounts).
- The maximum dataset auto refresh count is up to 8 times a day for Power BI Pro accounts and 48 for Premium.
- You can not define different refresh times for different data sources . For example if the model uses some Excel files that change monthly and some tables in SQL Server with daily changes, on every refresh both of them are imported into the model.
- The refresh process is full load not incremental, whole data is imported even though there was no changes and this takes time for the large datasets.
Direct Query Mode
In this mode no data is stored in Power BI Desktop or Service. By creating the reports visuals, Power BI makes queries and send them to the data sources and then will show the returned results on visuals.
The end user can always see the most recent data however time required to refresh a visual depends on the performance of underlying data source and the query response time. Direct query is used for relational databases or big data engines which can respond to SQL queries. SQL Server, Azure SQL DW, Oracle, Azure HDInsight and Google BigQuery are some of the supported data sources and the complete list can be seen via the link below:
https://docs.microsoft.com/en-us/power-bi/desktop-directquery-data-sources
After adding the tables to Power BI model, as default the tables will be automatically linked together based on their relations in database whoever there is chance to manually add the new links or modify the existing ones in model diagram:
We can turn SQL Profiler on and see the queries are sent to database by Power BI when filtering a visual:
Using Power Query
Except the very complex transformation functions, almost all of the Power Query functionalities can be used to transform, merge and aggregate Direct Query tables but need to remind that as no data get stored in the model the outcome of Power Query transformations later are converted to query that is sent to the datasource.
For better understanding I have merged Sale data with Product, ProductSubCategory and ProductCategory then aggregated the sales values by Category in the below example:
Then by using this table in a Power BI visual and profiling the source SQL Server we can see the query sent by Power BI:
Advantages
- It is a good option for large datasets where importing data is not possible or take a long time.
- Reports show the most recent data.
- No schedule refresh is required.
- There is no limitation for size of dataset.
- Any dashboard tile that is based on direct query will be refreshed automatically every 15 minutes.
Limitations
- The result of every query is limited to 1 million rows.
- Performance of reports and visuals is dependent to the underlying queries.
- Some of Power Query functionalities are not supported in Direct Query mode.
- Q&A and Quick Insight are not supported for Direct Query
- Auto Date/Time intelligence is not supported for the date columns
Other blogs of this series
Part 2: Composite mode & Aggregation