What makes a good datamodel? – Point 6: Forget deduplication

Once upon a time, in a land far far away, hard disks and memory were small and really expensive. In 1969 the computer that landed man on the moon only had 4k of RAM and 72k of ‘disk’ (ROM) storage.

When resources are that tight it makes you super efficient at using them and one of the key tenets of database design was “normalisation”.  The goal was to have zero duplication of data and so everything possible gets turned into a parameter in another table.  This is super efficient on storage but can cause performance problems.  Reading and writing to the database becomes complicated and it can be hard to change the structure later. 

Now storage is cheap and performance is everything.  Users want results returned in under a second with easy to understand schemas.  In reality a schema designed for one use case may not give good performance for another, so what you need is a schema that you add to and improve with every new use case.  Sometimes this means that you have to break the old rules and store things more than once, for example as both a property and a relationship (we’ll come to that in the next blog as well), but that’s ok!  Delivering the business value is key; storage is cheap. 

How we solve this in Eigen is partly down to our experience of building the initial graph and putting in structures that allow more than one way of making a query.  The other part is down to the use of a true graph database, because you can continually evolve the schema without breaking anything. 

written by

Murray Callander

posted on

June 25, 2020

you may also like...