{"id":5550,"date":"2021-06-10T17:04:37","date_gmt":"2021-06-10T17:04:37","guid":{"rendered":"https:\/\/47billion.com\/?p=5550"},"modified":"2024-12-23T05:16:35","modified_gmt":"2024-12-23T05:16:35","slug":"recommendation-system-using-graph-database","status":"publish","type":"post","link":"https:\/\/47billion.com\/blog\/recommendation-system-using-graph-database\/","title":{"rendered":"Recommendation System Using Graph Database"},"content":{"rendered":"\n
A recommendation system is a system that predicts an individual\u2019s preferred choices, based on available data. Recommendation systems are utilized in a variety of services, such as video streaming, online shopping, and social media. Typically, the system provides the recommendation to the users based on an item liked\/disliked, or movies watched by a user.<\/p>\n\n\n\n
Typically, a recommendation engine processes data through the following steps-<\/p>\n\n\n\n
In this blog, we’re going to discuss a graph-based recommendation engine<\/em><\/strong><\/p>\n\n\n\n In general recommendation systems work offline. A process passes each user’s history to a set of algos and generates recommendations once in a while as per business use case.<\/p>\n\n\n\n To understand the drawback of such a process, suppose a user searched for action movies and watched them, so typical offline systems recommend action movies when he will come next time. In this case, the system knows what a user is watching but not what he is about to watch and it can\u2019t accommodate this new knowledge.<\/p>\n\n\n\n As a result, its subsequent results will not be interesting and the user is going to ignore them.<\/p>\n\n\n\n Before talking about a graph-based recommendation engine, we will see what is a graph database and how it can help overcome shortcomings to design a robust, scalable, and fast recommendation engine.<\/p>\n\n\n\n A graph database is a database designed to treat the relationships between data as equally important to the data itself. It is intended to hold data without constricting it to a pre-defined model.<\/p>\n\n\n\n A graph database management system is an online database management system with Create, Read, Update, and Delete (CRUD) methods that expose a graph data model. In a graph data model, they don\u2019t have to infer data connections using things like foreign keys.<\/p>\n\n\n\n Relationships can also be modeled by relational databases; but to traverse those relationships, we need to write SQL queries that JOIN tables together. The joining process is computationally expensive and becomes slower as the number of joins increases, which makes real-time analysis impractical in production.<\/p>\n\n\n\n The graph database that we used is Neo4j. Neo4j<\/strong><\/a> is a native graph database platform, built from the ground up to leverage not only data but also data relationships. Neo4j connects data as it\u2019s stored, enabling queries never before imagined, at speeds never thought possible.<\/p>\n\n\n\n Since relationships are made explicit by the edge elements, traversing the graph is both simple and inexpensive. As a result, relationship-based queries in real-time can be easily performed, we can quickly capture any new movies searched by users and interests shown in their current online visit, both of which are essential for making real-time recommendations.<\/p>\n\n\n\n Graph databases use Nodes and relationships to store data so we have to define nodes and relationships.<\/p>\n\n\n\n Nodes are the entities in the graph. They can hold any number of attributes (key-value pairs) called properties.<\/p>\n\n\n\n Relationships<\/em> provide directed, named, semantically relevant connections between two node entities (e.g., Employee WORKS_FOR Company). A relationship always has a direction, a type, a start node, and an end node. Like nodes, relationships can also have properties.<\/p>\n\n\n\n The data used comes from various open sources consisting of millions of users and movies, shows, etc.<\/p>\n\n\n\n User data contains information like unique user ID, favorite genres, watched movies, and rated movies by the user.<\/p>\n\n\n\n Movie data consists of movie name, id, genres, actors, directors, image URL, etc.<\/p>\n\n\n\n Based on the movie\/show name external APIs have been used to collect data related to movies\/shows like IMDB ratings, directors, writers, and producers to enrich the data.<\/p>\n\n\n\n https:\/\/developers.themoviedb.org\/<\/a><\/p>\n\n\n\n https:\/\/www.omdbapi.com\/<\/a><\/p>\n\n\n\n Before storing data in the graph database some pre-processing steps have been performed-<\/p>\n\n\n\n In this section, we will generate recommendations from Neo4j using Cypher query language which is a declarative graph query language that allows for expressive and efficient querying and updating of a property graph. Further details about<\/p>\n\n\n\n Cypher Query Language<\/a><\/strong><\/p>\n\n\n\n First, just quickly explore how the database schema looks like in Neo4j.<\/p>\n\n\n In the above figure, some of the nodes and relationships have opted out. Here nodes are represented in different colors e.g. – the yellow node denotes Users; the pink node denotes movie genres and the green node denotes movies and different relationships between different nodes.<\/p>\n\n\n\n Let’s explore movies watched by a user- <\/p>\n\n\n\n In the above graph, we can see how easy it is to query movies watched by a user. Along with movies, we can also see that similar movies that are watched by a user can be fetched. So, we can see graph database is capable of storing such types of relationships which makes it easy to make real-time recommendations.\u00a0<\/p>\n\n\n\n This is not a schema or ER diagram but represents actual movies watched by a user. <\/p>\n\n\n\n Now we have a graph for a user we can easily think of generating a recommendation for a user. The simplest way to recommend movies for a user is to recommend the most-rated movies of all time.<\/p>\n\n\n\n Recommended movies-<\/strong><\/em><\/p>\n\n\n\n A user has watched movie GoldenEye <\/em><\/strong>based on the genre similar movies recommended are –<\/em><\/strong><\/p>\n\n\n Some other ways to design a recommendation system-<\/em><\/p>\n\n\n\n https:\/\/medium.com\/decathlondevelopers\/building-a-recommender-system-using-graph-neural-networks-2ee5fc4e706d<\/a><\/p>\n\n\n\nGraph databases- The Saviour!!<\/strong><\/h3>\n\n\n\n
Graph Database<\/h2>\n\n\n\n
Data<\/strong><\/h2>\n\n\n\n
Data Sources<\/strong><\/h2>\n\n\n\n
Data Pre-processing<\/strong><\/h2>\n\n\n\n
\n
Following nodes and relationships between them is created and schema is designed-<\/strong><\/h2>\n\n\n\n
Nodes-<\/h3>\n\n\n\n
\n
Relationships-<\/h3>\n\n\n\n
\n
Graph-based recommendation engine-<\/h3>\n\n\n\n
<\/figure><\/div>\n\n\n
\/\/Movies watched by a user\nMATCH path = (u:Users)-[:WATCHED]->(m1:Movies)\nWHERE u.userId =~'1'\nRETURN u.userId, m1.title, m1.rating_mean\n<\/code><\/pre>\n\n\n
<\/figure><\/div>\n\n\n
MATCH (u:Users)-[:WATCHED]->(m2:Movies)\nWITH m2 ORDER BY m2.rating_mean\nRETURN m2.title AS title, m2.rating_mean AS avg_rating\nORDER BY m2.rating_mean DESC LIMIT 100;\n<\/code><\/pre>\n\n\n
<\/figure><\/div>\n\n\n
Recommendation based on Similar users<\/strong><\/h2>\n\n\n\n
\/\/Movies based on similar users\nMATCH (u1:Users)-[:WATCHED]->(m3:Movies)\nWHERE u1.userId =~'1'\nWITH [i in m3.movieId | i] as movies\nMATCH path = (u:Users)-[:WATCHED]->(m1:Movies)-[s:SIMILAR]->(m2:Movies),\n(m2)-[:GENRES]->(g:Genres),\n(u)-[:FAVORITE]->(g)\nWHERE u.userId =~'10' and not m2.movieId in movies\nRETURN distinct u.userId as userId, g.genres as genres, \nm2.title as title, m2.rating_mean as rating\nORDER BY m2.rating_mean descending\nLIMIT 10\n<\/code><\/pre>\n\n\n
<\/figure><\/div>\n\n\n
Recommendation using Item-item Similarity-<\/strong><\/h3>\n\n\n\n
\/\/ Item-Item Similarity\nMATCH (m2:Movies {movieId: \"10\"})-[:GENRES]->(g:Genres)<-[:GENRES]-(other:Movies)\nWITH m2, other, COUNT(g) AS intersection, COLLECT(g.genres) AS i\nMATCH (m2)-[:GENRES]->(m2g:Genres)\nWITH m2,other, intersection,i, COLLECT(m2g.genres) AS s1\nMATCH (other)-[:GENRES]->(og:Genres)\nWITH m2,other,intersection,i, s1, COLLECT(og.genres) AS s2\nWITH m2,other,intersection,s1,s2\nWITH m2,other,intersection,s1+[x IN s2 WHERE NOT x IN s1] AS union, s1, s2\n<\/code><\/pre>\n\n\n\n
<\/figure><\/div>\n\n\n
Conclusion<\/strong><\/h2>\n\n\n\n
\n