Over the past five years building industrial scale knowledge graphs, we have learnt many lessons to build them faster – and focused on delivering value.
Here’s our top 10.
1. Start small, don’t boil the ocean
The temptation is to find a really meaty domain, one used by many people to deliver massive impact. The bigger the domain, the more stakeholders, the more nuances and the slower the process. Often, it’s better to pick a narrow domain, with a small number of users and build small solutions that can scale.
2. Build use case by use case
This is our main learning, always start with real-world problems, ideally with substantive value attached to their solution. Never try to build for what-if or you will waste time and risk user engagement. Build incrementally from each use case, adding more data and context into the knowledge graph.
3. The model serves the use case, never the other way round
This complements the previous lesson. As with building any digital solution there are moments when it’s tempting to add features and design ideas that perfect the product but don’t add value to the user or solve a problem. A laser focus on use case is vital.
4. Pick a graph technology vendor and stick to it
There are many great graph database vendors out there now, such is the demand – Stardog, Dgraph, ArangoDB etc. We did some testing a few years back and landed on Neo4j. We like the speed, the functionality, and the tools, but more importantly we’ve now invested time in understanding how it works – swapping now would slow our pace of delivery and learning. Stick with one – don’t get tempted by offers and features!
5. Use technology that allows you to access data at-the-source
When data is replicated to a data lake, there is inevitable loss of fidelity, integrity and context, not to mention waiting for the data lake project to finish the migration. Use a technology stack that allows working with data kept at the source. Eigen Ingenuity has been doing this for years; it can also access data already in a cloud-based data lake.
6. Visualisation of the knowledge graph is key to engage stakeholders
One of the things we really like about Neo4j is the way we can build a knowledge graph in minutes and share it with a client. A simple and intuitive front-end that masks considerable complexity below helps engage users and visualise what they are getting.
7. Whiteboard it first
If you can draw your outline knowledge graph on a whiteboard, you can model it in a tool like Neo4j. Drawing it out forces you to clarify your thinking before you start building, enabling you to identify opportunities to link with existing data sources, to surface any connection or logic challenges as well as to explore other use cases. Your whiteboard version is also a great communication tool to engage stakeholders so they understand - and can input to - what you plan to build.
8. Don’t constrain yourself to relational database thinking
Most of us have grown up with databases and so it’s natural to transfer this thinking to knowledge graph design, but the latter are more flexible and scalable; they don’t have predefined boundaries or the same constraints. Encourage free thinking in your design.
9. Don’t be afraid to iterate
Experimentation and learning are key to building knowledge graphs; we almost always build them in agile projects, with two or three week sprints to add in focused design, development and testing. By involving product owners at the end of each sprint to provide feedback and direction, we quickly learn how close the knowledge graph-based solution is to resolving the client’s problem and where we need to iterate further to close gaps.
10. Stay up to date
Knowledge graph technology is evolving fast. Keep informed about advances in technologies, standards and best practices or work with a partner who makes this a priority.
Have you any other lessons about building knowledge graphs? What’s worked for you – and what is still proving a challenge? Drop me a line at murray.callander@eigen.co I’d be happy to share any of our insights.