How to Measure Similarity or Difference of Meaning Between Words? [closed]

96
March 30, 2022, at 12:40 PM
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

Want to improve this question? Update the question so it's on-topic for Stack Overflow.

Closed last month.

Improve this question

Say you have two random words ('yellow' and 'ambient' or 'goose' and 'kettle'). What tech could be used to rate how similar or different they are in meaning as informed by popular usage? For example, from 0 to 1 where antonyms are 0 and synonyms are 1, 'yellow' and 'ambient' might be 0.65 similar.

Note: I'm not talking about how close the two strings are to each other, but rather an approximation of how similar their meanings are.

Answer 1

One approach that works quite well for measuring semantic similarity is looking at contexts in which the two words in question occur: as the distribution of words is not random, the context carries a lot of information, up to the point where you can guess what a word means (eg in a foreign language) which you don't know, as long as you understand the words around it.

In my PhD thesis I have investigated that approach in various ways; I have taken instances of a word from a corpus, and recorded their contexts, ie the n words to their left and right. Then you do this for another word, and compare how similar the contexts are. This will give you a value between 0 and 1, depending on the metric you use.

You can treat the contexts as frequency lists, and then compare the frequencies of the same word in both contexts, or you can be more specific and compare them also by distance to the target word. The more specific you are, the more data you need, but the more accurate your results will be.

One caveat is words with different meanings (homographs): the word left can be an adjective (the left door), a verb (they left the room), or a noun (they are part of the left). Each of them will have different contexts, but you won't be able to distinguish them automatically during processing, so the similarity values for words with multiple meanings will be somewhat 'smudged'. And some words will have identical contexts, eg names: (I went to X on holiday -- X can be any country/city/location).

It also will probably not work very well with antonyms, as they often occur in the same context: this door is open/closed/locked/unlocked, this book is too easy/difficult etc. But it might; hard to tell without actually trying it out. One thing that does work well are closed categories, such as days of the week or months.

While this can be done in a purely symbolic way, I think this is also the same principle that is used by embeddings in deep learning algorithms, where words are represented by context vectors.

Answer 2

I do not really understand what you exactly mean with similarity especially if you want to talk about meaning. You would need a dataset to denote meaning unto words. A popular example of this would be sentiment analysis. If you got a lot of textual data, say tweets from twitter, you might want to know if the data is mostly positive or negative. To do this you would find a dataset of similar nature who has labelled the data already into categories. Then you can use this data to classify the texts into categories (e.g with a Naive Bayes classifier). In this way you can denote meaning on texts computationally.

This would allow general evaluations but also evaluations on an input to input basis on how well they scored across different categories of meaning.

I'm not sure if that's what you're looking for in an answer though.

Rent Charter Buses Company
READ ALSO
Is there anyway to shorten the amount of lines used for having multiple turtles?

Is there anyway to shorten the amount of lines used for having multiple turtles?

So I'm trying to make a program where I want to have many turtlesBut I was wondering if instead of writing every single name to instate a new turtle being made, I could make it as short as one line if it's possible on Python turtle

109
I can't run Hydrogen. No kernel for grammar Python Found?

I can't run Hydrogen. No kernel for grammar Python Found?

Disclaimer: I am sure this has been answered, and I Googled but none of them seem to work unless I missed something

100
What is the most efficient way to see whether one item of a list is the sum of two other items in the list?

What is the most efficient way to see whether one item of a list is the sum of two other items in the list?

The script is supposed to take in three whole numbers from the user(who will input the numbers) and

109
Setting an instance of a class equal to a different instance python

Setting an instance of a class equal to a different instance python

ok so ive been trying to make a neural network in python from scratch as a learning experienceSo I have a network with two layers, each layer has weights and biases

74