Perspectives

To Copy Or Not To Copy – That Is The Question

Digital twin
Murray Callander
June 30, 2021

When it comes to digital transformation and learning how to discover insights hidden in data, one question that regularly arises is whether we should copy all of our data into a data warehouse or data lake, or alternatively connect directly to the source systems.

As digital enablement becomes a pressing concern, digital twin technology is becoming increasing prevalent, and twin graphs and knowledge graphs are increasingly being used, which of these options is the best approach for your digital strategy’s data layer when you’re digitising operations? Essentially, both options come with their own pros and cons, so how can you choose which one is right for you? Here, we take a closer look so you can make a well-informed decision.

Cloud Based Data Lakes And Directly Connecting To A Source System – The Potential Issues

When it comes to data visualisation, on paper, the architecture of cloud based data lakes look simple and this makes the concept an appealing one to those looking for an effective business intelligence tool. After all, all of the data enters the lake then people only need to connect to a single location to view everything – surely, then, this must make a data lake a convenient data visualisation tool to harness? However, in practice, things aren’t that simple.

The apparent simplicity of a Data Lake architecture is appealing

Just copying all of the data into this kind of system will take a long time and typically speed is of the essence when it comes to delivering something useful for users. Therefore, in reality, what happens is a cyclical process in which the source system and use case are prototyped against each other to define which data set requires copying up to the data lake. This data is then copied up before development begins against the data lake. Inevitably, what happens next is that an iterative process begins of discovering elements have been incorrectly transferred or are missing altogether.

With this in mind, it means that in practice every use case must be built with direct reference to the source system in any case. Developing a new functionality solely against a data lake will always present difficulties due to nuances in the data which has to be understood. In reality, data lakes are inevitably built on a use-case by use-case basis, with every new use case expanding the data lake’s scope.

In practice new functionality has to be built with reference to the source data and the Data Lake sits to the side.

Although directly connecting to a source system offers some convenience benefits, it presents some issues too. Often there are security concerns when it comes to giving direct access to the database, especially for systems that sit on sensitive networks. If networks have been only designed for a handful of users or run on old hardware it’s unlikely that they will be able to cope with multiple simultaneous connections or large ad-hoc requests. Furthermore, the data structures may be obscure, with information that can be seen by the user being generated in the application UI itself rather than stored in the database.

What Are The Advantages Of Copying Data Sets Into A Data Warehouse Or Cloud Based Data Lake?

If you’re considering data integration into a cloud based data lake or data warehouse, you need to be aware of the advantages of this option:

  • Performance – this is usually the top selling point of data warehouses or cloud based data lakes. Large volumes of data can be provided to multiple users rapidly.
  • Ability to query easily across several data sources – since all data is in one location queries will all be written using the same language.
  • Easier control of user access – there is only a single access point and granular control is usually possible over the data which is available to each user who tries to engage with the system.

What Are The Disadvantages Of Copying Data Sets Into A Data Warehouse Or Cloud Based Data Lake?

Integrating data into a cloud based data lake or data warehouse also presents some issues including:

  • Quality Assurance efforts – it requires a lot of ongoing to work to ensure the data lake remains consistent. With gigabytes of data being generated on a daily basis it represents a huge additional workload to keep the data lake’s data consistent. This raises the issue of whether or not the data stored in the cloud is up to date.
  • Entropy of information increases with copying – whenever information is copied from the source system the entropy level will always increase.
  • In practice the data lake can never have all of the data – there will inevitably be a data set which is required but is missing from the data lake, for example in the case of detailed data being kept by vendors with only aggregate or summary data being available externally.
  • Slow to develop against – to define the dataset required in the data lake an extra stage is required in the development process.

What Are The Advantages Of Connecting Directly To The Source Systems?

If you would rather avoid fusing data in a data lake, you need to know whether connecting directly to source systems is a better option. This option presents the following benefits:

  • Integrity of data – the data visualised is the data in the source systems.
  • Easy troubleshooting – when the data in the visualisation layer is wrong there’s only a single place that requires checking.
  • Simpler data flows.
  • Trying new things is easier as there is no extra layer required. This means fewer people are involved and less co-ordination is required allowing development to proceed rapidly.

What Are The Disadvantages Of Connecting Directly To The Source Systems?

  • Multiple simultaneous clients may not be supported – depending on the source system’s performance it may be incapable of handling multiple clients simultaneously especially 10 or 20-year old legacy systems.
  • Tracking can be problematic – if many user level systems connect across multiple source systems tracking access can be difficult.

Case Study – Corrective Maintenance

If work on a facility is both planned and tracked in a maintenance management system then every piece of work or job will have its own associated workorder. Within this workorder, there’ll be schedules, budgets, lists of dependencies and tasks, among other things.

The workorder will also have its own lifecycle in which it progresses through various stages from initial concept to scheduled and completed or cancelled. Interaction with the source system will only ever reveal the status of the workorder at the present time, so only a single version of the object will be available. However, if this data is then copied to a data lake, there are two options:

  1. Creating a new object each time the workorder’s status changes to allow the status to be seen at any point in time;
  2. Updating the status against the existing object. This will require a mechanism to detect changes at all levels in the source system.

Which option do you choose?

The answer will probably depending on the use case itself. If analysing the time that each workorder spends in its various stages is important then it’s necessary to have a time stamped history of each object change. On the other hand, if it’s only important for users to connect to the data lake rather than the source system, option 2 will be the best choice.

Issues can arise if you went for option 1, but users would have preferred option 2. This is because the user experience will become considerably more complex since filtering out all of the object’s superseded versions will be necessary.

So, Which Option Is Best?

So, when it comes to deciding whether to copy or not, the answer really depends on your own use case.

If you place more importance on performance than accuracy of any single thing, a data lake represents the best option. Training algorithms and research into machine learning are ideal examples of this since huge volumes of data are required and users don’t want to have to access the source systems frequently. Also, since training datasets are historic and won’t change, keeping them up to date is no problem.

Conversely, if you place more importance on accuracy than performance, for example in the case of interactive dashboards revealing job status information in real time, it’s better to connect directly to the sources. It’s faster to develop and simpler to maintain and support in the long-term.

Thought Leadership Presents A Possible Third Way

To make things even more complex, there is another, third digital work option – an integration layer or API that effectively connects directly to the source system while also adding an additional security layer (and possibly caching too). This can solve the issues surrounding data quality and security while also helping to improve performance.

As all APIs must be configured on a case-by-case basis and therefore there isn’t a single place where all the available data can be seen this may seem to be an issue, but it’s important to remember that in practice, data lakes will also require an initial definition with reference to the source system. On the downside, though, these layers will add another potential single failure point in the application architecture which must be considered when developing support and troubleshooting procedures so that sufficient resources can be made available by the layer’s business owner to support clients.

Nevertheless, there is one fact which holds true whichever option you choose – to validate any use case and ensure that users are able to trust the results, validating it against raw source data will always be necessary and, therefore, it’s very likely that some way of accessing raw data in the source system will continue to be required so that troubleshooting any issues or queries in the future will be possible.

Written by
Murray Callander
Posted on
June 30, 2021
September 20, 2021

Digital frontline

September 8, 2021

Digital Transformers

September 8, 2021

Digital twins: a myth buster

September 6, 2021

Digital twin: technology of now

May 30, 2021

How to make a Digital Twin of a Bowtie – Part 2

April 27, 2021

How to make a Digital Twin of a Bowtie – Part 1

You may also like

DCOM

Distributed Component Object Model (DCOM) is a Microsoft remote communication protocol at the heart of many real time data links.

Knowledge Graphs

Context and clarity through dynamic monitoring and visualisation

Knowledge Graphs

Glowing net and spirals
There is definitely something intuitive about building knowledge graphs, after all, they mirror the way we as humans acquire knowledge and retain it through connections and context.

Digital Transformation

The trend in many mature basins, just like the UK and Norwegian continental shelves, has been one of steady divestment by the supermajors and a new generation of smaller oil and gas operators.

Electric cars

The running cost of an electric car depends on where you charge it and can vary from 3p to over 35p per mile.
In short, electricity prices vary hugely compared to petrol or diesel; you can pay anywhere from 5p/kWh to 70p/kWh so it costs me anywhere from 3p/mile to 35p/mile depending on where I charge.

Electric cars

The range of an electric car is less important than how quickly you can recharge it
Would I recommend an electric car? Yes, definitely. They are nicer to drive, much cheaper to run and need less maintenance. Range isn’t the issue, maybe the real question is “are you prepared to change your habits?”.

Electric cars

reliable destination charging is key to making electric car ownership work for most people.
When you say you own an electric car everyone’s first question is about that magic word “Range” – “How far can it go on a charge?”. Having owned an electric car for over six years now I can tell you that range is much less important than the ability to recharge at your destination. Let me explain why.

Safety Barriers

At Eigen, we build and deploy live data models as part of a digital twin for oil and gas facilities. One of the challenges in doing this is how to combine the physical and the abstract into a working data model. And a great example of this is the bow tie – a diagram showing all the protective measures against hazardous events and their potential consequences.

Knowledge Graphs

The automobile and industrial designer Freeman Thomas said “We were promised a simpler life, and technology has only complicated our lives.” In complex domains, like oil and gas, technology may well have simplified some things but it has certainly complicated others, particularly where data is concerned.

Decision making

Death and taxes may historically have been the only two certainties in life, but we might consider a third – increasing volumes of data. With ubiquitous sensing, unlimited cloud storage and insatiable demands from every walk of life, data is growing exponentially.

Digital twin

Much of the focus of digital transformation is given to big data, clever systems and game-changing innovations, like robotics, AI and IoT. But what is often missing is the focus on people, who have been irrevocably transformed by digital.

Digital twin

We formed Eigen almost fifteen years ago, although the original founding team had all been working in oil and gas for the previous decade. As individuals, then later as Eigen, we worked with bp to deploy its breakthrough Field of the Future programme in Azerbaijan, and have since been part of many oil and gas operators’ digital transformation journeys.

Digital twin

There’s a lot of hype about digital twins in oil and gas. Much is well-founded, since the integration of the underlying technologies that make digital twins possible have potential to accelerate much needed digital transformation in the sector. But you can’t believe everything you read.

Digital twin

The concept of digital twin has a futuristic quality at a time when robotics and artificial intelligence also are coming of age. But whilst there’s no disputing the transformative impact of digital twins, in industries as wide-ranging as construction, marine, agriculture and oil and gas, what makes up a digital twin is technology of now.

Decision making

One of the problems of asset-heavy industries, including Oil & Gas and Petrochemical operations, is decision-making for maximum impact, under resource constraint conditions. These industries typically manage an array of highly interconnected assets and relationships which bear on the impact of the decision.

Digital twin

In this blog series we’re looking at how you turn a bowtie diagram into an online Digital Twin showing live barrier health information. The previous part looked at the level of detail required to make a working data model in practice. In this instalment we’re looking at the challenge of assessing the health of protective functions not directly related to the physical SECEs (Safety and Environmentally Critical Equipment).

Digital twin

In this blog series we’re looking at how you turn a bowtie diagram into an online Digital Twin showing live barrier health information. In this instalment we’re considering the level of detail required to make a working data model in practice.

Digital twin

How do you turn a bowtie diagram into an online Digital Twin showing live barrier health information?

Complex Subsea Hydraulic System

Imagine the situation: Hydraulic fluid pressure is dropping fast; supply tank levels are also dropping. You are pretty sure there is a big leak in the system, but you have no idea where it is. Pretty soon the operation will be compromised: The pressure will be below the minimum required and you may be forced to shut down the field. It is a scary picture.

What makes a good datamodel?

Wireframe rendering of turbojet engine and mirrored physical body on black background. Digital twin concept. 3D rendering image.
There is a lot of buzz nowadays about “Digital Twins”, and every Digitalisation Director out there is thinking they need to have a Digital Twin of their assets. It makes sense: If you have a digital representation of your assets; you can simulate, test scenarios, plan operations and visualise data within the context afforded by the twin, without having to visit or “be present” in the real asset, especially if these assets are remote.

Agile

Agile development software business web computer agility nimble quick fast start up concept
Ah, the old ones still the best ????. But such questions can sometimes be worth considering. For example, when is Agile not Agile?

Open Source

Two months ago I switched my phone from an Apple iPhone 7 to an Android based POCO X3 NFC. At the same time one of my daughters upgraded their phone to an iPhone XR so we’ve been comparing experiences. So what’s it been like??

Uncategorized

2021 futuristic tehcnology trend concept, hand man pointing in concept futuristic in 2021 coming year the year of artificial intelligence ,big data, iot, augmented reality, machine learning
As the COVID-19 pandemic hit us round about March 2020 and the Oil prices crashed, all of us working for the Oil & Gas industry panicked and prepared for the worse. In previous market downturns, budgets had been slashed and projects cancelled, but this time something quite different happened, at least in the Digitalisation space.

Agile

What has Agile ever done for us?
In an earlier blog, I wrote about what Agile is. And that’s all well and good, but what has Agile ever done for us?

Open Source

Using Open Source to avoid lock-in to expensive proprietary technologies
I’ve been using Apple products since 2007. Every smart phone I’ve had has been an iPhone but yesterday I ordered a new Android phone. I’m so fed up with shelling out around £1000 every couple of years but I’ve been trapped because all my music is in iTunes, all my family have iPhones and we run Macs*. Man, I hate being locked in! I wouldn’t mind if it was still value for money, but it’s not – I feel like I’m being taken for a ride now.

Perspective

In November (this month as I write this) we have not long closed out possible two of the best custom software projects we have done. Sure there were bugs in the testing and there will be more found by the users I’m sure, but what made them so good? We hit the schedule to within a week on both of them and there were no scope changes over the entire project, so we hit the budget as well. This is particularly impressive for one of them because it was a completely new piece of software, written from scratch; taken from powerpoint to in production in 5 months.

Uncategorized

percent sign on screen. interest rate
This picture makes me laugh and if it makes you laugh too then you’ve probably been involved in purchasing or project delivery at some point.

Is your Oil field being exploited at its best?

A man holding a flask of oil. Oil rigs in the background. Oil production in Russia. Sunset. Oil pumping
Production Well Testing is an essential part of managing an Oil & Gas field if the field has more than one well and production measurement resources are limited, as is generally the case.

Agile

Agile Software Development Business Internet Techology Concept
You have probably heard software developers talk about being Agile, and maybe wondered what they mean. Perhaps they do star jumps and touch their toes before breakfast, or write code while limboing under desks? Well, it’s nothing like that… read on to find out more!

Uncategorized

Eigen and Wintershall Dea partnership
I’m really excited about the long term partnership we have recently signed with Wintershall Dea for a number of reasons not least because it’s a great team and we have a lot of fun working together!

Decision Bridge Technology

eigendbt2
Back in July I asked myself the question “Why are we still in business after 11 years?”. A small specialist company in a big competitive industry. We’ve been through 3 downturns now and we continue to develop longstanding relationships with some of the most innovative operators in the business. Clearly we are doing something right but what is it?

Decision Bridge Technology

Young musician playing acoustic guitar and singing,
Did you ever learn to play “Sweet child of mine” on the guitar? Like almost every other teenager who had a guitar I learned the opening riff to “Sweet child of mine” and I remember playing it in a guitar shop when I was testing out an amp, only to have one of the staff point to a sign on the wall that said anyone playing “sweet child of mine” would be thrown out the shop and told never to return!

Is your Oil field being exploited at its best?

Smart city and abstract dot point connect with gradient line and aesthetic Intricate wave line design , big data connection technology concept .

In our previous blog, the discussion was centred on assessing how well an asset is exploited, as a key concern for the field manager and the asset team. The Integrated Choke Model (ICM) was proposed as a model to identify opportunities for improving the free cash flow as the fundamental objective during periods when the oil price is low.

Uncategorized

Businessman holding tablet and showing holographic graphs and stock market statistics gain profits. Concept of growth planning and business strategy. Display of good economy form digital screen.
The world was shocked to learn about the massive explosion in the port of Beirut: This has been a terrible tragedy with an astronomical cost in terms of lives lost and damage to property; one could even argue, to the whole country, as even the strategic grain reserves were hit.

Is your Oil field being exploited at its best?

Is your Oil field being exploited at its best?
A typical question for a Petroleum Field Manager is how well the asset is being exploited and if this exploitation is efficient and generates enough value for Operator and Partners. This article explores some answers based on our understanding of the challenges that appear throughout the life of the field, and to how to be prepared to transform them into opportunities.

What makes a good datamodel?

What makes a good datamodel?
Now we are potentially getting right in amongst the weeds and many a data model gets stuck in the weeds because this issue is not understood.

What makes a good datamodel?

What makes a good data model? | Eigen
Once upon a time, in a land far far away, hard disks and memory were small and really expensive. In 1969 the computer that landed man on the moon only had 4k of RAM and 72k of ‘disk’ (ROM) storage.

What makes a good datamodel?

What makes a good data model? | Eigen
Having the best datamodel in the world won’t help you grow your digital capabilities if the only way to access it is through proprietary drivers.

What makes a good datamodel?

What makes a good data model? | Eigen
Google search is not a validated datamodel. Nor is it comprehensive, but it’s good enough for most of us.

What makes a good datamodel?

What makes a good data model? | Eigen
“A well structured class library allows you to build applications that are portable across different assets. They overcome the problems of different naming conventions across assets or sites.”

What makes a good datamodel?

What makes a good data model? | Eigen
Trying to model humans is probably the easiest way I know to break an inflexible model. Why? Because humans are recursive – that means they keep repeating the same thing and also they keep changing the rules! Humans start off as a child and become parents, they form relationships with each other, they marry each other and they can now change gender. If you designed your database schema in the 1500’s you would have had to rebuild it several times since!

What makes a good datamodel?

What makes a good datamodel?
The problem with hierarchies is they are subjective and only relevant to a certain viewpoint or problem. For example, what is the hierarchy of equipment on an industrial facility? Should it be based on area or on equipment size, or on position in the process?

What makes a good datamodel?

What makes a good datamodel?
Every Digital Twin needs a data model somewhere inside it. The Data Model is the digital map of where everything is; it’s the central reference for how everything is connected; it’s the search engine for your information.

Ingenuity

Case Studies - Digitising Regular Meetings with Eigen
Not all process real time visualisation and collaboration tools are created equal. Always taking advantage of the latest digital technologies, Eigen Ingenuity exhibits five key features that make working remotely, working from home and working from anywhere, much easier.

Uncategorized

Internet broadband and multimedia streaming entertainment
If you are being tasked with creating reports for management on a daily or weekly basis, you are probably extracting real time data into Excel, adding manual figures, performing calculations (efficiency, availability, etc.) and then plotting the results.

Data Myths

Wireframed Robot hand making contact with human hand on dark 3D rendering
Ok, so this one isn’t entirely a myth, there are AI based tools that do make you more efficient.
Like the time one of my kids asked me what kind of plant this was when we were out on a walk; a quick snap using the Picture. This app and a few seconds later we knew it was a “Common milkwort”! Brilliant! (Though whether we’ll ever remember that for next time is a different matter).

Is your Oil field being exploited at its best?

How to get the most out of your mature Oil field
We have all been there… it’s another downturn! There is pressure from management to demonstrate that our mature oil field is profitable.

We need to take action to maximise free cash flow to ensure the profitability of our operation and that means maximising production, minimising water and sand, while making sure well integrity is preserved. On top of that we also have to deal with foaming, scaling or emulsions, while optimising the use of expensive chemicals.

Data Myths

Data Management and Visualisation - Eigen
Oh how I wish this one were actually true!

Asking data scientists to estimate the cost of a project is like asking a bricklayer to estimate the cost based on his bricks per hour when in reality you’ve got a plot of jungle, a sketch on a napkin and all the bricks are hidden.

Data Myths

AI Big Data Technologies - Eigen
Around 15 years ago Autonomy claimed this but it turned out to be a myth. In reality this is a really really hard problem for AI to solve. Here’s why, and then I’ll talk about some of the ways you can at least speed it up.

Uncategorized

Custom Development - Integrated Live Dashboards with Eigen
“It’s so cool, I can actually access live data and monitor my systems from home, on my Mac!”

Data Myths

Lake composed of Data Points
Getting new digital functionality working is complicated, especially when it involves integrating to existing systems from different vendors. The idea of copying all this data into a single ‘data lake’ is very attractive; everything will be in one place, your architecture drawings looks really clean, you don’t need to worry about how to connect to legacy systems or overloading them, the performance should be blisteringly fast. A modern one-stop-shop for all future development. I know this because this was my dream when we started Eigen 12 years ago! We were trying to build the one unifying data model that would enable all our functionality dreams. But here’s what we’ve found over the last 12 years.

Data Myths

What makes a good datamodel?
We’ve all been there right, the project seemed simple to start with but once we got into it, it started getting complicated and all sorts of thorns started to crop up that took way more time to solve than anyone on the project governance board would believe.

view download

Interested in Eigen and how we can help you?
Enter your email address below and we will redirect you to the download.

view case study

Interested in Eigen and how we can help you?
Enter your email address below and we will redirect you to the case study.

view case study

Interested in Eigen and how we can help you?
Enter your email address below and we will redirect you to the case study.

view case study

Interested in Eigen and how we can help you?
Enter your email address below and we will redirect you to the case study.

view case study

Interested in Eigen and how we can help you?
Enter your email address below and we will redirect you to the case study.

view case study

Interested in Eigen and how we can help you?
Enter your email address below and we will redirect you to the case study.

view case study

Interested in Eigen and how we can help you?
Enter your email address below and we will redirect you to the case study.

view case study

Interested in Eigen and how we can help you?
Enter your email address below and we will redirect you to the case study.

view case study

Interested in Eigen and how we can help you?
Enter your email address below and we will redirect you to the case study.

view case study

Interested in Eigen and how we can help you?
Enter your email address below and we will redirect you to the case study.

view case study

Interested in Eigen and how we can help you?
Enter your email address below and we will redirect you to the case study.

view case study

Interested in Eigen and how we can help you?
Enter your email address below and we will redirect you to the case study.

Download Brochure