How solid is your data estate?

May 8, 2020
Published on

How solid is your data estate?

Time and time again, we see data estates that have been built using outdated patterns that  encounter problems when trying to scale.

In this article, we give you a couple of tips to consider when laying down the foundations of your data estate.

#1: Beware the siren song of ‘drag and drop’ configuration

It can be very tempting to put together a solution using drag and drop tools. They have a low barrier of entry, quick to edit and can produce fast results. However, they also can:

  • Introduce an iceberg effect of making complex tasks appear to be simpler than they are
  • Become difficult to scale with large solutions as they generally favour manual configuration over automation
  • Be a challenge trying to enforce consistency across a solution

#2: DevOps is the path

This has been the gold standard for application development. However, we still see a slow uptake of its use in data estates, which consequently cause manual and error prone deployments that lead to more stress and reduces the pace of change. DevOps has proven to:

  • Reduce development cycles
  • Reduce implementation failure
  • Increase communication and cooperation

#3: Consider Spark

You do not need to have a ‘big data’ workload to benefit from the use of Apache Spark as your data transformation engine. Spark enables:

  • The use of combination ‘set based’ logic (i.e. SQL based queries) with ‘imperative’ logic (e.g. python code). This gives your developers a consistent mechanism to perform any data transformation, despite its complexity
  • The combination of real time and batch transformation using a unified processing engine
  • Close collaboration between your data scientists and data engineers. Historically, they have operated in different toolsets but with Spark, they work together on a common platform

#4: Look towards automation

We created the product ‘LakeFlow’ to help rapidly build resilient data estates. LakeFlow is a data engineering service which will:

  • Deploy a data estate within your Azure environment using only Azure first-party components
  • Generate pipelines and onboard new data sources to your data estate quickly. This allows you to focus on your dashboards and insights
  • Automatically maintain a historical record of your data, in a cost-effective data lake
  • Proactively monitor your pipelines, picking up anomalies in data volume flows before failures occur

If you would like to know more or need assistance in building rock solid data estates, contact us.

Case Study

Afford's Azure Migration

Case Study

Data Estate Implementation


The future of Generative AI

Case Study

Maximising Operational Efficiency: RAQ’s Managed Services Partnership with Data Addiction for their Data Estate Optimisation