Find similar news on SQL based on similar tags

72
February 14, 2021, at 8:10 PM

I have a list of news from a few newspapers(Getting them from a RSS feed). Let's say every newspaper returns a list of news with tags. For example:

newspaper1:

  • title1, tag1, tag2, tag3
  • title2, tag1, tag7, tag5, tag8

newspaper2:

  • title3, tag3, tag4, tag5
  • title4, tag1, tag5, tag7, tag9, tag10

So, I am thinking on storing all the news in one table(newspaper_id, news_id, title), and then another table with a row per tag(news_id, tag_name).

Now, I need to query the tables and compare every news from the first newspaper with the rest of newspaper news and returns similar news. In my sample data, title1 shares one tag with title3 and 4 from the other newspaper. And title2, shares 3 tags with title4, and just one tag with title3. I need this, how many tags every news from a newspaper share with the rest

I have been trying hard with GROUP BY or INNER JOIN on the same table with no luck. Any idea?

Create table and insert data statements:

CREATE TABLE news (
newspaper_id INT(6),
news_id INT(6) PRIMARY KEY,
title VARCHAR(250) NOT NULL 
); 
CREATE TABLE tags ( 
news_id INT(6) NOT NULL,
name VARCHAR(30) NOT NULL
); 

INSERT INTO `news`VALUES (1, 1, 'USA elections'), (1, 2, 'Coronavirus crisis'), (2, 3, 'Another thing about USA elections'), (2, 4, 'Who will win elections?'), (3, 5, 'Coronavirus affetcs elections');
INSERT INTO `tags`VALUES (1, 'elections'), (1, 'biden'), (1, 'trump'), (2, 'coronavirus'), (3, 'biden'), (3, 'trump'), (3, 'elections'), 
(4, 'elections'), (5, 'coronavirus'), (5, 'elections');

Expected results:

| Title                  | news_id || compared_news_id || Tags in common |
| ---------------------- | ------- || ---------------- || -------------- |
| 'USA elections'        | 1       || 3                || 3              |
| 'USA elections'        | 1       || 4                || 1              |
| 'USA elections'        | 1       || 5                || 1              |
| 'Coronavirus crisis'   | 2       || 5                || 1              
Answer 1

If you don't care about news that don't match at all, then just look out for matching tags.

select
  n1.news_id, n1.title,
  n2.news_id as compared_news_id, n2.title as compared_news_title,
  count(*) as tags_in_common
from news n1
join news n2 on n2.news_id <> n1.news_id
join tags t1 on t1.news_id = n1.news_id
join tags t2 on t2.news_id = n2.news_id and t2.name = t1.name
where n1.newspaper_id = 1
group by n1.news_id, n2.news_id
order by n1.news_id, n2.news_id;

Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=6ff1db3be344c40b82f892654ca08e3a

If you don't want to restrict this to one newspaper, then remove where n1.newspaper_id = 1. In that case, if you want to avoid to have both news1/news5 and news5/news1 in your results, then change n2.news_id <> n1.news_id to n2.news_id > n1.news_id.

You can also change on n2.news_id <> n1.news_id to on n2.news_id <> n1.news_id and n2.newspaper_id <> n1.newspaper_id of course, if you don't want to compare news of the same newspaper.

READ ALSO
Update data only if user change anything in laravel

Update data only if user change anything in laravel

Update data only if the user changes anything otherwise return a message that nothing to updateAfter click the edit button, if the user hits update button without change anything then return message nothing to update

88
How to make webpack ignore a require to a mising file in a npm module?

How to make webpack ignore a require to a mising file in a npm module?

When bundling my project with webpack, I encountered an error message indicating that anode file required in one of my npm package does not exist

62
Dynamically loading property value from active list of profiles and set the value in configuration XML file

Dynamically loading property value from active list of profiles and set the value in configuration XML file

I have a spring bath application which contains a batch configurationI want to dynamically load profile based upon the environemnt variable parameter of active profile list

75