Recently at a customer I came across the terminology of a Power Platform Dataflow. Immediately a few questions raised in my head. What are Power Platform Dataflows? Is there any difference between a Power BI Dataflow? It wasn’t so easy to answer these questions, so in this blogpost I’ll try to summarize what I’ve learned so far.
Power BI Dataflows
Power BI Dataflows are general available since April 2nd, 2019. Power BI dataflows give analysts the self-service capabilities to ingest, integrate and enrich ‘big data’. Architecturally this meant the ability to easily move data from various sources into a data-lake. More specifically into an Azure Data Lake Storage Gen2. A data lake is a very cost-effective, scalable big-data solutions built on top of Azure Blob storage. All of this is abstracted from the end-user and completely managed by Power BI. This means for example that you don’t have direct access to the Data Lake and you can’t leverage all of it’s features. Unless you configure Power BI to store its dataflows in your own data lake (how-to).
As can be seen in the image above, the data is stored in so-called “Common Data Model compliant” folders. Basically this means a standardized way of storing data. This with the great benefit that other applications (like Power BI) are able to easily understand the data format and structure. More details on the can be found here.
After a Power BI Dataflow has been created, you want to create reports and dashboards of this data. Power BI desktop connects to a Power BI Dataflow as datasource as can be seen in the image below.
This data connection injects the data from the Data Lake into a Power BI Dataset. So the only real use-case of using a Power BI Dataflow is to inject data into a lake which then would end up in a Power BI Dataset. Power BI Dataflows can be re-used by other report makers so that they can leverage on the data already being available in the lake thus lowering the effort of data preparation. You are also able to share (and certify) a Power BI Dataset to report makers to make their life even more easy. Architecturally this would look as follows:
Again, note that the Data Lake is ‘locked’ and you can’t directly reference or manage the lake.
UPDATE 28/01/2020: You can also configure Power BI to store its Dataflows into your own managed Azure Data Lake. How to do this is described in this article. Credits to otravers for pointing this out.
Power Platform Dataflows
So, now what are those Power Platform Dataflows all about? In the Power Apps portal you might have noticed that the menu item ‘Data Integration’ has been replaced by ‘Dataflows’:
So are the Power Platform Dataflows just a rebranding of the Power Apps data integration projects? Well yes, sort of. But additional features were introduced with the name-change. I also believe the underlying technology is no longer making use of the Power Platform data integration platform. But the idea of being able to easily integrate data into the Common Data Service still remains one of the main use cases of a Power Platform Dataflow. Comparing this with the Power BI Dataflows where the data lands into a data lake. Architecturally, this looks as follows:
But actually, using a Power Platform Dataflow, you also get the ability to directly load data into your own Azure Data Lake Storage Gen2. You configure this kind of “Analytical” Power Platform Dataflow by checking the “Analytical entities only” checkbox when creating a Dataflow:
Again, visualizing this using our architectural diagram, an “Analytical Only” Power Platform dataflow would look as follow:
The data does not land in the Common Data Service but immediately into a data lake. Note the unlocked icon at the lake meaning we can bring (and manage) our own lake whereas this was not the case with a Power BI Dataflow.
You get the data in Power BI by select the Power Platform dataflows.
Both the Power Platform Analytical Dataflows and the Power BI Dataflows are available for selection using the Power Platform dataflows connector.
So, we can use the Power Platform Dataflows to integrate data into the Common Data Service or into our own Data Lake (Power Platform Analytical Dataflow).
Now what about moving data from our Common Data Service into our own lake? As you might have noticed in the latest architectural diagram, the Common Data Service is not a valid data source for a Power Platform Dataflow. It is only a valid destination. So how do we push data from the Common Data Service into our own lake? Well, simple, by using the “Export to data lake” functionality.
Export to data lake
The export to data lake feature has been introduced in preview by the end of October 2019. The official announcement can be found here. By now they have removed the preview notion from the menu-item making me believe it is already general available (but I haven’t found this official announcement).
The export to data lake feature continuously replicates data of Common Data Service entity into (your own) Azure Data Lake Gen 2 storage. It does this in an incremental fashion. Similar how the data export service used to work (but this one exports the data to your Azure SQL). Soon to be deprecated?
Setup is surprisingly simple. Just enable your CDS entities for change tracking and you are good to go to configure the export to data lake service for those entities:
Adding this export to data lake into our architectural picture, we get the following:
Data can be fed into the Common Data Service by a Power Platform Dataflow and can then be fed into a data lake by using the export to data lake feature.
Now, what if I would have my own Data Lake fed by the export to data lake functionality and I would like to perform reporting on top of this data using Power BI. How do we get the data into a Power BI Dataset?
You could connect from Power BI to the data lake directly:
When connecting to the lake directly, you don’t leverage the CDM features so you will have a lot more date transformation effort.
Because our Lake has data in the CDM format, you could attach the existing CDM folder as a Power BI Dataflow which then would leverage the data model.
Attach existing CDM folder
The feature to attach and existing CDM folder into Power BI is shown when you create a new Power BI Dataflow:
This allows you to add your own Data Lake (in case you have stored the data in a CDM format). As the export to data lake functionality of Common Data Service exports the data into a lake using the CDM format, you could add this CDM folder as a Power BI dataflow. This Power BI Dataflow then becomes “externally” managed:
This gives all the modelling benefits a normal Power BI Dataflow:
In summary, the definitions are:
- Power BI Dataflows are for getting data into a data lake (which is either managed by Power BI or by yourself).
- Power Platform Dataflows are for integrating data into the Common Data Service.
- Power Platform Analytical Dataflows are for integrating data into your own data lake.
- Export to data lake is to bring data from your Common Data Service into your own data lake.
Microsoft is really continuing the efforts to make the Power Platform components (Power Automate, Power Apps & Power BI) work more seamlessly together.
Hopefully this gives you a good overview in understanding the difference between a Power BI Dataflow, a Power Platform Dataflow and a Power Platform Analytical Dataflow!
6 thoughts on “Demystifying Power Platform Dataflows”
You actually can “bring your own datalake” with a Power BI dataflow. You need to jump through many hoops in order to enable that in a Power BI workspace, but I’ve used this successfully:
Thanks Olivier! Indeed, you are correct. I will update the post accordingly.
Great article.. however what are the licensing implications of each approach
Thanks for the info, I couldn’t find anybody to give me a clear answer on this. Not even the Microsoft sales reps
Thanks for the article! Very useful! When would you recomend to use Power Apps dataflows instead of Power BI Dataflows? They seem the same if you want to export to data lake and the CDS entities are available on both of them…
I was looking at a Powerapp Dataflow folder connected to a storage account and noted that there were csv snapshots for every instance I had the dataflow run. Does anyone know if we have to manage these snapshots? Put another way, do we need to worry about deleting them over time?
If I have a dataflow that runs every 1/2 hour, I might end up with a lot of snapshots!