In this blog post, we will show you some information regarding the new Power BI functionality known as Dataflow, that already exists in services such as Office 365. We must highlight that this new service is still in the Beta stages, so it is currently subject to modifications and updates.

Dataflow is a new Power BI service functionality for ETL self-service processes that allows data analysts and/ or business users to share reusable data entities and to build them using Power Query.

In contrast with Power BI datasets which, at the end of the day, are merely Analysis Services table models, Dataflows are stored as entities in the Common Data Service (CMD) in Azure Data Lake Storage Gen2 (ADLSg2). Both the Datasets (Analysis Services), as well as the Dataflows (CMD), are characterized by an underlying technology that is opaque to users.

Image 1. Functional diagram

 

Dataflow allows users who are familiar with datasets, reports, and Power BI queries, to also create the reusable data building blocks themselves. These can be later used as data sources for their datasets.

Dataflow limitations

Before going deeper into the matter, we must be aware that Dataflow has certain limitations depending on the Power BI license. In order to use this functionality, you must be at least a Pro user. You cannot benefit from using Dataflow with a free standard license.

So, depending on whether you have a Premium or Pro license, you will have access to more or fewer functionalities within Dataflow:

FunctionalityPro LicensePremium License
Scheduled refresh8/day48/day
Total storage10 GB/user100 TB/node
Dataflow Authoring with Power Query Online
Dataflow Management within Power BI
Dataflows Data Connector in the Power BI Desktop
Integration with Azure
Computed Entities (in-storage transformations via M)
New connectors
Dataflow incremental refresh
Running on Power BI Premium capacity / Parallel execution of transforms
Dataflow linked entities
Standardized Schema / Built-In Support for the Common Data Model

Table 1 Information obtained from Microsoft’s official website (docs.microsoft.com)

Creating and using Dataflow examples

Coming up next, we will provide a detailed step by step guide of how to create and use Dataflows with a Power BI Pro license.

First of all, go to your app.powerbi.com service. Once there, create a Workspace on which you can create your Dataflows. Before you can start using Dataflow, you will be required to download and setup the Data Gateway in order to connect to your data source.

Image 2. Data Gateway must be downloaded and setup

 

Once you have created your Workspace and your Data Gateway is set up pointing to your data source, go to your new Workspace and create your Dataflow.

Image 3. Creating your Dataflow

 

A window containing the Dataflow editor will immediately pop up. Here you must first add a new entity that will be the source of your data. Following, you will be required to select a data source. In this case, it will be SQL Server.

Next, you will need to setup your connection to this data source by selecting the Gateway you have previously downloaded, installed and setup.

Image 4. Setting up the Dataflow connection

 

Once the connection to your data source is created, you will be able to export the desired tables using Power Query:

Image 5. Imported data

 

In the next window, there will be a query editor that is very similar to that in Power BI Desktop. Here, you will be able to modify your source data using code M or basic transformations. A question mark will be visible on the left hand side at the top of each table icon. This question mark will disappear as you click on each of the tables in order to prompt the first data refresh. Once you have completed the required modifications within the editor, simply click on “Done” in order to save your first Dataflow. This can be edited as often as required by the user.

Image 6. Dataflow query editor

 

You will be required to refresh your data before connecting to the Dataflow from Power BI.

Image 7. First data refresh

 

In addition to this, and in the same manner as with the rest of Power BI service elements, Dataflows can also be set to carry out scheduled data refresh. As you can see, since we are using a Pro license, and as commented in the above section on limitations, we are allowed to refresh the data “only” 8 times.

Image 8. Scheduled refresh

 

Finally, you will not only be able to connect to your Dataflow from Power BI Desktop, but also to process it as any other data source where you can establish the required links, create DAX measures and edit the queries using M code or basic transformations, etc.

Image 9. Connecting to your Dataflow from Power BI Desktop

 

We must also highlight that, since this is at Beta stage, there are some small faults. Among these, for example, is the fact that Power BI creates a series of links when importing your Dataflow tables. There are no reasons why these links would have been made by correct fields. Furthermore, not all tables are linked with one another. For this reason, we recommend users to check them before they start working on the model.

Image 10. Automatically created model that must be checked

 

Bibliography

Any technical information used in this blog was obtained from:

Rafael Toledo

Rafael Toledo

Data Platform Specialist at SolidQ
Rafael Toledo

Latest posts by Rafael Toledo (see all)