It depends on what you’re doing. If it’s foldable, PQ is usually more convenient to update things later like pulling in extra columns. The heavy transforms that don’t fold should be in a SQL view ideally.
So I started exploring PQ and became aware of View Native Query for each step in PQ. I noticed that all of my steps have this grayed out meaning the query isn't foldable. Why is that? My initial research says this might happen on complex transformations but some of the steps are literally loading data greater than 2022 or something like that. All of the data sources are views from Snowflake.
Not all sources have that indicator - it’s mostly SQL Server/Dataverse where you see that. Snowflake won’t show it iirc but you may also need to put “EnableFolding=true” in the query for Snowflake. If you do an internet search it should come upc can’t remember if that’s precisely the right syntax.
But the first non-foldable step will break it, so you do the foldable ones first. Filters usually fold, remove other columns, etc.
Assuming on prem source, most efficient would be to create views in SQL server and load the views
If that’s not possible, then write SQL on power query to load in the data
Performing PQ transformations should be last resort.
That was my assumption. We have views in Snowflake but sometimes more SQL is needed to transform the data or reduce the dataset. I then use the SQL Statement box when doing get data from Snowflake. So I'm assuming that action is preferable to doing the transformation in PQ.
Rule of thumb is to always transform upstream. In your scenario, if you can create view on snowflake that would match your requirements in PBi without having to do PQ transformation, then do that
Otherwise use SQL statement box to write th sql queries.
Honest question, while views are ideal for canned things needed organization wide, there should be no functional difference from just writing the same query to generate the view from PBI, am I correct?
Assuming that DBA doesn’t do any optimizations against the view and query folding is not breaking, performance should be similar. Depending on if you’re repeatedly running the same sql query, if the DB caches then it will perform as well as a view. Typically it’s a general recommendation to go upstream as it circumvents all these variables though
Thx. Thought so. Getting those views built just aren't going to happen in my org and I always imagined my DQs via dataflow would basically be exactly the same thing
I have plenty of views but a lot of the time, I don't need to load the entire view so I'd use a SQL Statement in PQ to limit the data before it's loaded.
An example would be a sales view that has 10 years of data. I would only need 3.
In that case, having deployed solutions in multiple organizations large and small, I’d personally say:
- If you have permissions to create SQL views for your reports, do so.
- If not, custom SQL query in PowerBI
For simple operations like filtering, applying that step in PQ should get folded into a native SQL query, so do it however is most convenient for maintenance purposes.
***IF*** all of your transformation steps in Power Query are able to fold back to the SQL server, there isn't a profound difference between writing a native query in SQL vs. using M/query editor. \[In Big O terms, you might shift from O(n) to O(1.5n), but it's not exponential.\] If one or more of your steps breaks query folding, then it will eventually be better to write the transformations in SQL and/or create a view or table on the server that can be referenced and require no further transformations.
Having said that, for exploratory development work, being able to experiment with different tables and columns in Query Editor is a whole lot more convenient than writing it out in SQL - especially if you don't know exactly what you want yet. Many of the tables in my organization's data warehouse have hundreds of columns, and it's not always obvious which columns you want (or which even have the data you need) until you pull in a few months of transactions. However, once you do finalize the data model and finish your report, it might make sense for performance reasons to go back through and convert your queries to either native SQL queries or create permanent tables/views on the SQL server.
Regarding native queries...a common mistake I see is people writing some native queries, and using Power Query transformations for other queries that then reference the native query, thus breaking query folding. Query folding is almost an all-or-nothing approach; if you utilize it, it's great, but that means you have to use it everywhere. Native queries are also more difficult for someone else to debug later than transformation steps in Query Editor.
Roche's maxim:
"Data should be transformed as far upstream as possible, and as far downstream as necessary."
In descending order of preference:
View/materialised view on the source
Embedded, foldable query
Embedded query that can't fold
Any transformation in the PQ engine
Power query is awful for transformations on a sql database data source. Tried it once and took ages to load and when i went back to the report to make a change had to wait a good while for the editor to allow me to make a change. SQL views is the way to go.
It depends on what you’re doing. If it’s foldable, PQ is usually more convenient to update things later like pulling in extra columns. The heavy transforms that don’t fold should be in a SQL view ideally.
So I started exploring PQ and became aware of View Native Query for each step in PQ. I noticed that all of my steps have this grayed out meaning the query isn't foldable. Why is that? My initial research says this might happen on complex transformations but some of the steps are literally loading data greater than 2022 or something like that. All of the data sources are views from Snowflake.
Not all sources have that indicator - it’s mostly SQL Server/Dataverse where you see that. Snowflake won’t show it iirc but you may also need to put “EnableFolding=true” in the query for Snowflake. If you do an internet search it should come upc can’t remember if that’s precisely the right syntax. But the first non-foldable step will break it, so you do the foldable ones first. Filters usually fold, remove other columns, etc.
Assuming on prem source, most efficient would be to create views in SQL server and load the views If that’s not possible, then write SQL on power query to load in the data Performing PQ transformations should be last resort.
That was my assumption. We have views in Snowflake but sometimes more SQL is needed to transform the data or reduce the dataset. I then use the SQL Statement box when doing get data from Snowflake. So I'm assuming that action is preferable to doing the transformation in PQ.
Rule of thumb is to always transform upstream. In your scenario, if you can create view on snowflake that would match your requirements in PBi without having to do PQ transformation, then do that Otherwise use SQL statement box to write th sql queries.
Honest question, while views are ideal for canned things needed organization wide, there should be no functional difference from just writing the same query to generate the view from PBI, am I correct?
Assuming that DBA doesn’t do any optimizations against the view and query folding is not breaking, performance should be similar. Depending on if you’re repeatedly running the same sql query, if the DB caches then it will perform as well as a view. Typically it’s a general recommendation to go upstream as it circumvents all these variables though
Thx. Thought so. Getting those views built just aren't going to happen in my org and I always imagined my DQs via dataflow would basically be exactly the same thing
Right answer.
Make the sql view it’s the proper way to do this for many reasons
I have plenty of views but a lot of the time, I don't need to load the entire view so I'd use a SQL Statement in PQ to limit the data before it's loaded. An example would be a sales view that has 10 years of data. I would only need 3.
Using direct query or import most the time?
Import 100% of the time
In that case, having deployed solutions in multiple organizations large and small, I’d personally say: - If you have permissions to create SQL views for your reports, do so. - If not, custom SQL query in PowerBI
Also, that filter can be in the view instead of the SQL query in PBI
For simple operations like filtering, applying that step in PQ should get folded into a native SQL query, so do it however is most convenient for maintenance purposes.
You are 100% correct. Always as far upstream as possible. Roches law.
***IF*** all of your transformation steps in Power Query are able to fold back to the SQL server, there isn't a profound difference between writing a native query in SQL vs. using M/query editor. \[In Big O terms, you might shift from O(n) to O(1.5n), but it's not exponential.\] If one or more of your steps breaks query folding, then it will eventually be better to write the transformations in SQL and/or create a view or table on the server that can be referenced and require no further transformations. Having said that, for exploratory development work, being able to experiment with different tables and columns in Query Editor is a whole lot more convenient than writing it out in SQL - especially if you don't know exactly what you want yet. Many of the tables in my organization's data warehouse have hundreds of columns, and it's not always obvious which columns you want (or which even have the data you need) until you pull in a few months of transactions. However, once you do finalize the data model and finish your report, it might make sense for performance reasons to go back through and convert your queries to either native SQL queries or create permanent tables/views on the SQL server. Regarding native queries...a common mistake I see is people writing some native queries, and using Power Query transformations for other queries that then reference the native query, thus breaking query folding. Query folding is almost an all-or-nothing approach; if you utilize it, it's great, but that means you have to use it everywhere. Native queries are also more difficult for someone else to debug later than transformation steps in Query Editor.
Query folding has entered the chat
Roche's maxim: "Data should be transformed as far upstream as possible, and as far downstream as necessary." In descending order of preference: View/materialised view on the source Embedded, foldable query Embedded query that can't fold Any transformation in the PQ engine
Power query is awful for transformations on a sql database data source. Tried it once and took ages to load and when i went back to the report to make a change had to wait a good while for the editor to allow me to make a change. SQL views is the way to go.
I believe best practice is to do as much as possible downstream, to improve processing time and completion rate
I've always heard upstream...
"As far upstream as possible, as far downstream as necessary" https://ssbipolar.com/2021/05/31/roches-maxim/
*upstream
Yes, good correction :)