In previous part, We learn about what is data flows and when to use it. If you miss the previous part then here is the link
In this article, We will see how to create and configure dataflows.
Before we begin, We will create a demo workspace in Power BI.
Step 1 : Create a new workspace
Step 2 : Create a new dataflow
To create a dataflow, launch the Power BI service in a browser then select a Demo workspace.
There are a multiple of ways to create or build on top of a new dataflow:
- Create a dataflow using define new entities
- Create a dataflow using linked entities
- Create a dataflow using a computed entity
- Create a dataflow using import/export
We will only cover Point 1 : Create a dataflow using define new entities.
Using the Define new entities option lets you define a new entity/table and connect to a new data source.
Select common data service.
Provide the CDS url.
After login successful, Select the data entities as below.
We will remove the columns which are not needed now. Now, below entities are available in Dataflows.
Step 3: Configure a dataflow
To configure the refresh of a dataflow, select the More menu (the ellipsis) and select Settings.
The Settings options provide many options for your dataflow, as the following sections describe.
We will see some of important settings below.
- Gateway Connection: In this section, you can choose whether the dataflow uses a gateway, and select which gateway is used.
- Data Source Credentials: In this section you choose which credentials are being used, and can change how you authenticate to the data source.
- Sensitivity Label: Here you can define the sensitivity of the data in the dataflow. To learn more about sensitivity labels, see how to apply sensitivity labels in Power BI.
- Scheduled Refresh: Here you can define the times of day the selected dataflow refreshes. A dataflow can be refreshed at the same frequency as a dataset.
Step 4 : Refreshing a dataflow
Dataflows act as building blocks on top of one another. When the schedule refresh for the dataflow triggers, it will trigger any dataflow that references it upon completion. This functionality creates a chain effect of refreshes, allowing you to avoid having to schedule dataflows manually.
There are a few limitations when dealing with linked entities refreshes:
- A linked entity will be triggered by a refresh only if it exists in the same workspace
- A linked entity will be locked for editing if a source entity is being refreshed. If any of the dataflows in a reference chain fail to refresh, all the dataflows will roll back to the old data (dataflow refreshes are transactional within a workspace).
- Only referenced entities are refreshed when triggered by a source refresh completion. To schedule all the entities, you should set a schedule refresh on the linked entity as well. Avoid setting a refresh schedule on linked dataflows to avoid double refresh.
Incremental Refresh (Premium only) Dataflows can be also set to refresh incrementally. To do so, select the dataflow you wish to set up for incremental refresh, and then select the incremental refresh icon.
Setting incremental refresh adds parameters to the dataflow to specify the date range. For detailed information on how to set up incremental refresh, see the incremental refresh in Power Query article.
There are some circumstances under which you should not set incremental refresh:
- Linked entities should not use incremental refresh if they reference a dataflow. Dataflows do not support query folding (even if the entity is Direct Query enabled).
- Datasets referencing dataflows should not use incremental refresh. Refreshes to dataflows are generally performant, so incremental refreshes shouldn’t be necessary. If refreshes take too long, consider using the compute engine, or DirectQuery mode.
In next article, We will see how to consume Dataflows.
If you like this article, feel free to share it work others who might find it helpful. If you have any questions, feel free to get in contact with me.