Advertising and marketing Analytics on the Databricks Lakehouse utilizing Fivetran and dbt


Advertising and marketing groups use many various platforms to drive advertising and marketing and gross sales campaigns, which might generate a major quantity of priceless however disconnected information. Bringing all of this information collectively might help to drive a big return on funding, as proven by Publicis Groupe who have been capable of improve marketing campaign income by as a lot as 50%.

The Databricks Lakehouse, which unifies information warehousing and AI use circumstances on a single platform, is the perfect place to construct a advertising and marketing analytics answer: we preserve a single supply of reality, and unlock AI/ML use circumstances. We additionally leverage two Databricks associate options, Fivetran and dbt, to unlock a variety of selling analytics use circumstances together with churn and lifelong worth evaluation, buyer segmentation, and advert effectiveness.

Fivetran and dbt can learn and write to Delta Lake utilizing a Databricks cluster or Databricks SQL warehouse

Fivetran permits you to simply ingest information from 50+ advertising and marketing platforms into Delta Lake with out the necessity for constructing and sustaining complicated pipelines. If any of the advertising and marketing platforms’ APIs change or break, Fivetran will deal with updating and fixing the integrations so your advertising and marketing information retains flowing in.

dbt is a well-liked open supply framework that lets lakehouse customers construct information pipelines utilizing easy SQL. All the pieces is organized inside directories, as plain textual content, making model management, deployment, and testability easy. As soon as the information is ingested into Delta Lake, we use dbt to rework, take a look at and doc the information. The reworked advertising and marketing analytics information mart constructed on prime of the ingested information is then prepared for use to assist drive new advertising and marketing campaigns and initiatives.

Each Fivetran and dbt are part of Databricks Associate Join, a one-stop portal to find and securely join information, analytics and AI instruments straight inside the Databricks platform. In only a few clicks you may configure and join these instruments (and lots of extra) straight from inside your Databricks workspace.

Learn how to construct a advertising and marketing analytics answer

On this hands-on demo, we’ll present methods to ingest Marketo and Salesforce information into Databricks utilizing Fivetran after which use dbt to rework, take a look at, and doc your advertising and marketing analytics information mannequin.

All of the code for the demo is offered on Github within the workflows-examples repository.

dbt lineage graph showing data sources and models
dbt lineage graph displaying information sources and fashions

The ultimate dbt mannequin lineage graph will appear to be this. The Fivetran supply tables are in inexperienced on the left and the ultimate advertising and marketing analytics fashions are on the proper. By deciding on a mannequin, you may see the corresponding dependencies with the completely different fashions highlighted in purple.

Information ingestion utilizing Fivetran

Fivetran has many marketing analytics data source connectors
Fivetran has many advertising and marketing analytics information supply connectors

Create new Salesforce and Marketo connections in Fivetran to begin ingesting the advertising and marketing information into Delta Lake. When creating the connections Fivetran will even routinely create and handle a schema for every information supply in Delta Lake. We’ll later use dbt to rework, clear and mixture this information.

Define a destination schema in Delta Lake for the Salesforce data source
Outline a vacation spot schema in Delta Lake for the Salesforce information supply

For the demo title the schemas that can be created in Delta Lake marketing_salesforce and marketing_marketo. If the schemas don’t exist Fivetran will create them as a part of the preliminary ingestion load.

Select which data source objects to synchronize as Delta Lake tables
Choose which information supply objects to synchronize as Delta Lake tables

You possibly can then select which objects to sync to Delta Lake, the place every object can be saved as particular person tables. Fivetran additionally makes it easy to handle and think about what columns are being synchronized for every desk:

Fivetran monitoring dashboard to monitor monthly active rows synchronized
Fivetran monitoring dashboard to observe month-to-month energetic rows synchronized

Moreover, Fivetran supplies a monitoring dashboard to investigate what number of month-to-month energetic rows of information are synchronized every day and month-to-month for every desk, amongst different helpful statistics and logs.

Information modeling utilizing dbt

Now that every one the advertising and marketing information is in Delta Lake, you should utilize dbt to create your information mannequin by following these steps

Setup dbt undertaking regionally and connect with Databricks SQL

Arrange your native dbt growth atmosphere in your chosen IDE by following the set-up directions for dbt Core and dbt-databricks.

Scaffold a brand new dbt undertaking and connect with a Databricks SQL Warehouse utilizing dbt init, which is able to ask for following data.

$ dbt init
Enter a reputation to your undertaking (letters, digits, underscore): 
Which database would you want to make use of?
[1] databricks
[2] spark

Enter a quantity: 1
host ( 
http_path (HTTP Path): 
schema (default schema that dbt will construct objects in): 
threads (1 or extra) [1]: 

After getting configured the profile you may take a look at the connection utilizing:

$ dbt debug

Set up Fivetran dbt mannequin packages for staging

Step one in utilizing the Marketo and Salesforce information is to create the tables as sources for our mannequin. Fortunately, Fivetran has made this simple to stand up and operating with their pre-built Fivetran dbt mannequin packages. For this demo, let’s make use of the marketo_source and salesforce_source packages.

To put in the packages simply add a packages.yml file to the basis of your dbt undertaking and add the marketo-source, salesforce-source and the fivetran-utils packages:

  - bundle: dbt-labs/spark_utils
    model: 0.3.0
  - bundle: fivetran/marketo_source
    model: [">=0.7.0", "=0.4.0", "

To download and use the packages run

 $ dbt deps

You should now see the Fivetran packages installed in the packages folder.

Update dbt_project.yml for Fivetran dbt models

There are a few configs in the dbt_project.yml file that you need to modify to make sure the Fivetran packages work correctly for Databricks.

The dbt_project.yml file can be found in the root folder of your dbt project.

spark_utils overriding dbt_utils macros

The Fivetran dbt models make use of macros in the dbt_utils package but some of these macros need to be modified to work with Databricks which is easily done using the spark_utils package.

It works by providing shims for certain dbt_utils macros which you can set using the dispatch config in the dbt_project.yml file and with this dbt will first search for macros in the spark_utils package when resolving macros from the dbt_utils namespace.

 - macro_namespace: dbt_utils
   search_order: ['spark_utils', 'dbt_utils']

Variables for the marketo_source and salesforce_source schemas

The Fivetran packages require you to outline the catalog (known as database in dbt) and schema of the place the information lands when being ingested by Fivetran.

Add these variables to the dbt_project.yml file with the proper catalog and schema names. The default catalog is hive_metastore which can be used if _database is left clean. The schema names can be what you outlined when creating the connections in Fivetran.

   marketo_database: # depart clean to make use of the default hive_metastore catalog
   marketo_schema: marketing_marketo
   salesforce_database: # depart clean to make use of the default hive_metastore catalog
   salesforce_schema: marketing_salesforce

Goal schema for Fivetran staging fashions

To keep away from all of the staging tables which are created by the Fivetran supply fashions being created within the default goal schema it may be helpful to outline a separate staging schema.

Within the dbt_project.yml file add the staging schema title and it will then be suffixed to the default schema title.

   +schema: your_staging_name # depart clean to make use of the default target_schema
   +schema: your_staging_name # depart clean to make use of the default target_schema

Primarily based on the above, in case your goal schema outlined in profiles.yml is mkt_analytics, the schema used for marketo_source and salesforce_source tables can be mkt_analytics_your_staging_name.

Disable lacking tables

At this stage you may run the Fivetran mannequin packages to check that they work appropriately.

dbt run –choose marketo_source
dbt run –choose salesforce_source

If any of the fashions fail as a consequence of lacking tables, since you selected to not sync these tables in Fivetran, then in your supply schema you may disable these fashions by updating the dbt_project.yml file.

For instance if the e-mail bounced and e-mail template tables are lacking from the Marketo supply schema you may disable the fashions for these tables by including the next below the fashions config:

   +schema: your_staging_name 
       +enabled: false
       +enabled: false
     +enabled: false
     +enabled: false

Creating the advertising and marketing analytics fashions

dbt lineage graph showing the star schema and aggregate tables data model
dbt lineage graph displaying the star schema and mixture tables information mannequin

Now that the Fivetran packages have taken care of making and testing the staging fashions you may start to develop the information fashions to your advertising and marketing analytics use circumstances which can be a star schema information mannequin together with materialized mixture tables.

For instance, for the primary advertising and marketing analytics dashboard, you could need to see how engaged sure firms and gross sales areas are by the variety of e-mail campaigns they've opened and clicked.

To take action, you may be a part of Salesforce and Marketo tables utilizing the Salesforce person e-mail, Salesforce account_id and Marketo lead_id.

The fashions can be structured below the mart folder within the following manner.

|-- dbt_project.yml
|-- packages.yml
|-- fashions
      |-- mart
             |-- core
             |-- intermediate
             |-- marketing_analytics

You possibly can view the code for all of the fashions on Github within the /fashions/mart listing and under describes what's in every folder together with an instance.

Core fashions

The core fashions are the information and dimensions tables that can be utilized by all downstream fashions to construct upon.

The dbt SQL code for the dim_user mannequin

with salesforce_users as (
   from {{ ref('stg_salesforce__user') }}
   the place e-mail isn't null and account_id isn't null
marketo_users as (
   from {{ ref('stg_marketo__lead') }}
joined as (
   from salesforce_users
     left be a part of marketo_users
     on salesforce_users.e-mail = marketo_users.e-mail

choose * from joined

You may also add documentation and assessments for the fashions utilizing a yaml file within the folder.

There are 2 easy assessments within the core.yml file which have been added

model: 2

 - title: dim_account
   description: "The Account Dimension Desk"
     - title: account_id
       description: "Major key"
         - not_null
 - title: dim_user
   description: "The Consumer Dimension Desk"
     - title: lead_id
       description: "Major key"
         - not_null

Intermediate fashions

A few of the ultimate downstream fashions could depend on the identical calculated metrics and so to keep away from repeating SQL you may create intermediate fashions that may be reused.

The dbt SQL code for int_email_open_clicks_joined mannequin:

with opens as (
	choose * 
	from {{ ref('fct_email_opens') }} 
), clicks as (
	choose * 
	from {{ ref('fct_email_clicks') }} 
), opens_clicks_joined as (

      o.lead_id as lead_id,
      o.campaign_id as campaign_id,
      o.email_send_id as email_send_id,
      o.activity_timestamp as open_ts,
      c.activity_timestamp as click_ts
    from opens as o 
      left be a part of clicks as c 
      on o.email_send_id = c.email_send_id
      and o.lead_id = c.lead_id


choose * from opens_clicks_joined

Advertising and marketing Analytics fashions

These are the ultimate advertising and marketing analytics fashions that can be used to energy the dashboards and studies utilized by advertising and marketing and gross sales groups.

The dbt SQL code for country_email_engagement mannequin:

with accounts as (
	from {{ ref('dim_account') }}
), customers as (
	from {{ ref('dim_user') }} 
), opens_clicks_joined as (

    choose * from {{ ref('int_email_open_clicks_joined') }} 

), joined as (

	choose * 
	from customers as u
	left be a part of accounts as a
	on u.account_id = a.account_id
	left be a part of opens_clicks_joined as oc
	on u.lead_id = oc.lead_id


	billing_country as nation,
	rely(open_ts) as opens,
	rely(click_ts) as clicks,
	rely(click_ts) / rely(open_ts) as click_ratio
from joined
group by nation

Run and take a look at dbt fashions

Now that your mannequin is prepared you may run all of the fashions utilizing

$ dbt run

After which run the assessments utilizing

$ dbt take a look at

View the dbt docs and lineage graph

dbt lineage graph for the marketing analytics model
dbt lineage graph for the advertising and marketing analytics mannequin

As soon as your fashions have run efficiently you may generate the docs and lineage graph utilizing

$ dbt docs generate

To then view them regionally run

$ dbt docs serve

Deploying dbt fashions to manufacturing

After getting developed and examined your dbt mannequin regionally you might have a number of choices for deploying into manufacturing one in all which is the brand new dbt process kind in Databricks Workflows (non-public preview).

Your dbt undertaking ought to be managed and model managed in a Git repository. You possibly can create a dbt process in your Databricks Workflows job pointing to the Git repository.

Using a dbt task type in Databricks Workflows to orchestrate dbt
Utilizing a dbt process kind in Databricks Workflows to orchestrate dbt

As you might be utilizing packages in your dbt undertaking the primary command ought to be dbt deps adopted by dbt run for the primary process after which dbt take a look at for the subsequent process.

Databricks Workflows job with two dependant dbt tasks
Databricks Workflows job with two dependant dbt duties

You possibly can then run the workflow instantly utilizing run now and likewise arrange a schedule for the dbt undertaking to run on a specified schedule.

Viewing the dbt logs for each dbt run
Viewing the dbt logs for every dbt run

For every run you may see the logs for every dbt command serving to you debug and repair any points.

Powering your advertising and marketing analytics with Fivetran and dbt

As proven right here utilizing Fivetran and dbt alongside the Databricks Lakehouse Platform permits you to simply construct a strong advertising and marketing analytics answer that's simple to set-up, handle and versatile sufficient to swimsuit any of your information modeling necessities.

To get began with constructing your individual answer go to the documentation for integrating Fivetran and dbt with Databricks and re-use the marketing_analytics_demo undertaking instance to shortly get began.

The dbt process kind in Databricks Workflows is in non-public preview. To strive the dbt process kind, please attain out to your Databricks account govt.


Please enter your comment!
Please enter your name here