Monte Carlo and Databricks Accomplice to Assist Corporations Construct Extra Dependable Information Lakehouses


This can be a collaborative submit between Monte Carlo and Databricks. We thank Matt Sulkis, Head of Partnerships, Monte Carlo, for his contributions.

As firms more and more leverage data-driven insights to innovate and preserve their aggressive edge, it’s important that this information is correct and dependable. With Monte Carlo and Databricks’ partnership, groups can belief their information by end-to-end information observability throughout their lakehouse environments.

Has your CTO ever advised you that the numbers in a report you confirmed her appeared approach off?

Has a knowledge scientist ever pinged you when a important Spark job didn’t run?

What a few rise in a subject’s null price that went unnoticed for days or even weeks till it induced a big error in an ML mannequin downstream?

You’re not alone for those who answered sure to any of those questions. Information downtime, in different phrases, intervals of time when information is lacking, inaccurate, or in any other case misguided, is an all-too-familiar actuality for even the very best information groups. It prices tens of millions of {dollars} in wasted income and as much as 50 % of a knowledge engineering workforce’s time that could possibly be spent constructing information merchandise and ML fashions that transfer the needle for the enterprise.

To assist firms speed up the adoption of extra dependable information merchandise, Monte Carlo and Databricks are excited to announce our partnership, bringing end-to-end information observability and information high quality automation instruments to the information lakehouse. Information engineering and analytics groups that depend upon Databricks to derive important insights about their enterprise and construct ML fashions that may now leverage the ability of automated information observability and monitoring to forestall dangerous information from affecting downstream shoppers.

Reaching dependable Databricks pipelines with information observability

With our new partnership and updated integration, Monte Carlo provides full, end-to-end coverage across data lake and lakehouse environments powered by Databricks.
With our new partnership and up to date integration, Monte Carlo gives full, end-to-end protection throughout information lake and lakehouse environments powered by Databricks.

Over the previous few years Databricks has established the lakehouse class, revolutionizing how organizations retailer and course of their information at an unprecedented scale throughout almost infinite use instances. Cloud information lakes like Delta Lake have gotten so highly effective (and widespread) that in keeping with Mordor Intelligence, the information lake market is anticipated to develop from $3.74 billion in 2020 to $17.60 billion by 2026, a compound annual progress price of almost 30%.

Monte Carlo itself is constructed on the Databricks Lakehouse Platform, enabling our information and engineering groups to construct and prepare our anomaly detection fashions at unprecedented velocity and scale. Constructing on prime of Databricks has allowed us to give attention to our core worth of enhancing observability and high quality of knowledge for our clients whereas leveraging the automation, infrastructure administration, and analytics scaling instruments of the lakehouse. This makes our assets extra environment friendly and higher in a position to serve our clients’ information high quality wants. As our enterprise grows, we’re assured it’s going to scale with Databricks and improve the worth of our core providing.

Now, with Monte Carlo and Databricks’ partnership, information groups can be sure that these investments are leveraging dependable, correct information at every stage of the pipeline.

“As information pipelines change into more and more advanced and corporations ingest increasingly information, usually from third-party sources, it’s paramount that this information is dependable,” stated Barr Moses, co-founder and CEO of Monte Carlo. “Monte Carlo is worked up to companion with Databricks to assist firms belief their information by end-to-end information observability throughout their lakehouse.”

With Monte Carlo, data teams get complete Databricks Lakehouse Platform coverage no matter the metastore.
With Monte Carlo, information groups get full Databricks Lakehouse Platform protection irrespective of the metastore.

Coupled with our new Databricks Unity Catalog and Delta Lake integrations, this partnership will make it simpler for organizations to take full benefit of Monte Carlo’s information high quality monitoring, alerting, and root trigger evaluation functionalities. Concurrently, Monte Carlo clients will profit from Databricks’ velocity, scale, and suppleness. With Databricks, analytics or machine studying duties that beforehand took hours and even days to finish can now be delivered in minutes, making it quicker and extra scalable to construct impactful information merchandise for the enterprise.

Right here’s how groups on Databricks and Monte Carlo can profit from our strategic partnership:

  • Obtain end-to-end information observability throughout your Databricks Lakehouse Platform with out writing code. Get full, automated protection throughout your information pipelines with a low-code implementation course of. Entry out-of-the-box visibility into information Freshness, Quantity, Distribution, Schema, and Lineage by plugging Monte Carlo into your lakehouse.
  • Know when information breaks, as quickly because it occurs. Monte Carlo constantly displays your Databricks property and proactively alerts stakeholders to information points. Monte Carlo’s machine learning-first method provides information groups full protection for freshness, quantity and schema modifications, and opt-in distribution displays and business-context-specific checks layered on prime make sure you’re lined at every stage of your information pipeline.
  • Discover the foundation trigger of knowledge high quality points quick. Pre-built machine learning-based monitoring and anomaly detection save time and assets, giving groups a single pane of glass to research and resolve information points. By bringing all data and context in your pipelines into one place, groups spend much less time firefighting information points and extra time innovating for the enterprise.
  • Instantly perceive the enterprise affect of dangerous information. Finish-to-end Spark lineage on prime of Unity Catalog in your pipelines from the purpose it enters Databricks (or additional upstream!) all the way down to the enterprise intelligence layer, information groups can triage and assess the enterprise affect of their information points, decreasing threat and enhancing productiveness all through the group.
  • Forestall information downtime. Give your groups full visibility into your Databricks pipelines and the way they affect downstream studies and dashboards to make extra knowledgeable growth selections. With Monte Carlo, groups can higher handle breaking modifications to ELTs, Spark fashions, and BI property by figuring out what’s impacted and who to inform.

Along with supporting present mutual clients, Monte Carlo gives end-to-end, automated protection for groups migrating from their legacy stacks to Databricks Lakehouse Platform. Furthermore, Monte Carlo’s security-first method to information observability ensures that information by no means leaves your Databricks Lakehouse Platform.

Monte Carlo can automatically monitor and alert for data schema, volume, freshness, and distribution anomalies within the Databricks Lakehouse Platform.
Monte Carlo can routinely monitor and alert for information schema, quantity, freshness, and distribution anomalies throughout the Databricks Lakehouse Platform.

What our mutual clients must say

Monte Carlo and Databricks clients like ThredUp, a number one on-line consignment market, and Ibotta, a world cashback and rewards app, are excited to leverage the brand new Delta Lake and Unity Catalog integrations to enhance information reliability at scale throughout their lakehouse environments.

ThredUp’s information engineering groups leverage Monte Carlo’s capabilities to know the place and the way their information breaks in real-time. The answer has enabled ThredUp to instantly establish dangerous information earlier than it impacts the enterprise, saving them time and assets on manually firefighting information downtime.

“With Monte Carlo, my workforce is healthier positioned to grasp the affect of a detected information concern and determine on the subsequent steps like stakeholder communication and useful resource prioritization. Monte Carlo’s end-to-end lineage helps the workforce draw these connections between important information tables and the Looker studies, dashboards, and KPIs the corporate depends on to make enterprise selections,” stated Satish Rane, Head of Information Engineering, ThredUp. “I’m excited to leverage Monte Carlo’s information observability for our Databricks atmosphere.”

At Ibotta, Head of Information, Jeff Hepburn and his workforce depend on Monte Carlo to ship end-to-end visibility into the well being of their information pipelines, beginning with ingestion in Databricks all the way down to the enterprise intelligence layer.

“Information-driven choice making is a big precedence for Ibotta, however our analytics are solely as dependable as the information that informs them. With Monte Carlo, my workforce has the instruments to detect and resolve information incidents earlier than they have an effect on downstream stakeholders, and their end-to-end lineage helps us perceive the inside workings of our information ecosystem in order that if points come up, we all know the place and repair them,” stated Jeff Hepburn, Head of Information, Ibotta. “I’m excited to leverage Monte Carlo’s information observability with Databricks.”

Pioneering the way forward for information observability for information lakes

These updates allow groups to leverage Databricks for information engineering, information science, and machine studying use instances to forestall information downtime at scale.

On the subject of making certain information reliability on the lakehouse, Monte Carlo and Databricks are higher collectively. For extra particulars on execute these integrations, see our documentation.


Please enter your comment!
Please enter your name here