In this article we will compare the two main ways to manipulate data in Power BI - Power Query and DAX (Data Analysis Expressions). Along the way we will look at the strengths and weaknesses of each approach and give suggestions on how to make the most of each method.
Power Query is a tool available in both Microsoft Excel and Microsoft Power BI that allows end users to perform Extract/Transform/Load (ETL) operations on data. The overarching purpose of Power Query is to allow a simple way to "massage" data into data structures that are more amenable to reporting and analytics.
Power Query "feels" like an ETL tool - it handles extraction, conversion of data types, filtering, and other functions that traditional ETL tools excel at.
DAX stands for Data Analysis Expressions and is a way to perform dynamic, IN MEMORY operations on data in Power BI. DAX does not filter or perform manipulations on data before it enters memory.
DAX feels less like a query language and more like working with functions in Excel or setting up "group by" columns in SQL. Getting into the DAX mentality is often difficult for people from a more linear programming background as it requires you to think about a calculation and how it might be performed at different aggregation levels of the data.
To understand how Power Query and DAX relate to one another one must first understand the two different "storage modes" that Power BI uses to deal with data - DirectQuery, and Import. Direct Query goes directly (hence the name) against the data source and all manipulations against the data are performed in memory via DAX expressions. Therefore Power Query is not available for tables using DirectQuery mode.
In contrast, Import mode brings data from the data source, processes it, and saves it in its own datastore. In Power BI desktop this datastore is contained in the .pbix file, and in Power BI online it is stored as a separate data set from the report.
Power Query allows you to take data that is imported in Import Mode and perform further manipulations upon it to get the data in the format that you would like so that it is easier to build reports using the Power BI visual designer.
As stated above, Power Query can only be used on imported data, whereas DAX can be used on both imported and DirectQuery data. If you are using DirectQuery you are stuck with DAX. However, if you are importing data you can use either one (or both in some circumstances). Here are some considerations that might make you choose one over the other:
DAX queries run on top of existing data and perform calculations in memory and they do not store extra data, so DAX wins hands down in terms of keeping the size of your dataset/pbix file small. On the flip side, all of that calculation can greatly slow down reports.
Power Query only executes when a data refresh is performed, and it stores the results of its data manipulation in static tables. Therefore the "performance hit" of Power Query only happens when the report is refreshed. This makes Power Query ideal for reports that contain a lot of data but are not refreshed frequently.
Power Query is great when you know from the beginning what kinds of aggregations you will have to perform on data. For example a lot of business reporting is simply rolling up numbers (sales, customer acquisition, etc) to weekly/monthly/quarterly/annual aggregations. Power Query shines in this regard because aggregation tables can be built from the beginning with this well-defined end reporting in mind. Pre-calculating these numbers makes the front-end reports significantly faster.
On the flip side Power Query really does not work when the end user needs to be more flexible in how they aggregate data, or when there are other dynamic filters that will affect the data aggregation. In these cases it makes more sense to use DAX calculated measures which take into consideration any filters that have been put onto the data in the report front-end.
Power Query has a very nice visual interface for editing queries and the Power Query "mentality" is very similar to working with SQL, thus it is much easier to pick up than DAX. DAX, by contrast, requires a complete shift in mindset. Therefore it is much easier for a data analyst to pick up Power Query skills than DAX.
As an organization's business intelligence capacity and complexity grows it becomes important also to be able to track changes to code. Neither Power Query nor DAX are particularly easy to track changes and back up code. However, of the two, Power Query is somewhat better as the steps Power Query takes can be expressed as a text-based script in the Power Query M language. These scripts can be copied and pasted into some kind of master file and then added to version control, allowing you to track changes.
One area in which Power Query shines is pre-filtering data for a particular report. Often the concern of a particular Power BI report is narrow - show the performance of one particular product, or department, over the course of one year. Using DAX one must query the entire table and then apply the filters in memory every time there is a change in the report parameters/slicers/filters. Power Query allows you to only download the data you want when the report is refreshed so that you do not have to think about applying filters later in the visual aspect of Power BI. Here are some common filters that are implemented in Power Query:
Power Query allows for intuitive grouping of data into separate aggregation tables. This aggregation is particularly powerful when dealing with large sets of data as this allows you to avoid calculating aggregations on the fly in reports.
Power BI will create a data model for you when you import data, but sometimes it is easier to work from one denormalized data table rather than dealing with the cognitive complexity of many joined tables. Power Query allows you to denormalize the data up front into a few mega-tables that have all the detail you need. Then additional filters can be applied to this denormalized table in DAX to get the results you need.
DAX is best when you are going to let users explore data freely and aggregate it at different levels in the front end. In this case you can define a custom measure in the data that is calculated at different aggregation levels - the measure calculated itself based on whatever grouping is being used on the data at the time.
DAX works well when you are going to look at data based on different date ranges or where you want to filter based on certain data dimensions (sales regions, customer segments, etc).
DAX allows you to create data tables "on the fly" in memory based on certain conditions. This can simplify certain reports and visuals such as master/detail views.
1. Use Power Query to create a local data set that is focused on your set of dashboards/reports. Pre-filtering data and storing it locally using Power Query leads to some great performance gains as the DAX expressions that you might use later will be going against locally stored data rather than a network data store.
2. Use Power Query to aggregate data if you know what kind of aggregations you need ahead of time. If you are doing very standard end of month/quarter/year reports then you can perform those aggregations in Power Query and store the results in a table that can simply be dropped into a front-end report with little fuss.
3. Use Power Query for very common business logic that is repeated across reports. Because it is easier to "version control" and back up Power Query data operations it makes sense to perform as much business logic as possible in Power Query so that you can copy this logic from report to report.
4. Use Power Query to create calendar tables and other lookup tables that can be created beforehand. While it would be nice to have all the important tables already existing in datastores, sometimes it makes sense to create some "local" data tables that can be used to code data. Power Query is great for these local tables such as calendar tables, aggregation categories that vary from report to report, etc.
5. Use Power Query to denormalize data for consumption in the Power BI frontend. Every organization has a few people whose skill set is better than the average data analyst but not quite up to the level of developer. These power users often do well with a dashboard that has many slicers sitting on top of a denormalized data table. You can use Power Query to "dump" all the data from various relational tables into one denormalized "mega table" and then allow these power users to explore the data from the denormalized table.
6. Use DAX to do dynamic calculations and aggregations. DAX is great when you are creating a dashboard for a power user who is doing exploratory data analysis (EDA). In this case you can create all the aggregation calculations as measures built upon data tables and allow the user to set their own filters and aggregation levels.
322 North Shore Drive
Building 1B, Suite 200
Pittsburgh, PA 15212
I am interested in information on: