Why You Need a Knowledge Graph, And How to Build It | by Stan Pugsley | Aug, 2023


A Guide to Migrating from a Relational Database to a Graph Database

TLDR: A knowledge graph organizes events, people, resources, and documents in a graph database for advanced analysis. This article will explain the purpose of a knowledge graph and show you the basics of how to translate a relational data model into a graph model, load the data into a graph database, and write some sample graph queries.

Relational databases are great for creating lists, but terrible for managing networks of diverse entities. Have you ever tried to do any of these tasks with a relational database?

  • analyze a healthcare episode of care when a patient interacted with dozens of people, places and procedures
  • find patterns in financial fraud with a web of vendors, customers and transaction types involved
  • optimize the dependencies and interconnected elements of a supply chain

These are all examples of networks of events, people and resources that create huge headaches for SQL analysts using relational databases. Relational databases become exponentially slower as the network size increases, while graph databases have a relatively linear relationship. If you are managing a network, or web, of activities and things, a graph database is the right choice. In the future, we should expect to see enterprise data groups adopting a combination of relational databases for isolated analysis on one business function, and knowledge graphs for complex, networked processes that span functions.

A knowledge graph, based in graph database technology, is built to handle a diverse network of processes and entities. In a knowledge graph, you have nodes that represent people, events, places, resources, documents, etc. And you have relationships (edges) that represent links between the nodes. The relationships are physically stored in the database with a name and direction. Not every graph database is a knowledge graph. To be considered a knowledge graph, the design must embed the business semantic model, reflected in clear business names for nodes and relationships, in a diverse set of nodes that span multiple business functions. You are in essence creating a seamless web out of all parts of the business that interact, and using the business semantics to closely tie data to the processes they represent. This can serve as the foundation for future generative LLM model use.

To illustrate a diverse set of data in a knowledge graph, let’s look at a simple example for supply chain logistics. The business process might be modeled like this:

Supply chain graph database model. Image by the author.

This model could be extended to include any related part of the business processes: customer returns, invoices, raw materials, manufacturing processes, employees, and even customer reviews. There is no pre-defined schema, so the model can expand in any direction or depth.



Source link

Leave a Comment