word frequencies in text file in python

304
December 26, 2017, at 8:00 PM

I want to find frequencies for the certain words in wanted, and while it finds me the frequecies, the displayed result contains lots of unnecessary data.

Code:

from collections import Counter
import re
wanted = "whereby also thus"
cnt = Counter()
words = re.findall('\w+', open('C:/Users/user/desktop/text.txt').read().lower())
for word in words:
    if word in wanted:
        cnt[word] += 1
print (cnt)

Results:

Counter({'e': 131, 'a': 119, 'by': 38, 'where': 16, 's': 14, 'also': 13, 'he': 4, 'whereby': 2, 'al': 2, 'b': 2, 'o': 1, 't': 1})

Questions:

  1. How do i omit all those 'e', 'a' 'by', 'where', etc.?
  2. If I then wanted to sum up the frequencies of words (also, thus, whereby) and divide them by total number of words in text, would that be possible?

disclaimer: this is not school assignment. i jut got lots of free time at work now and since i spend a lot of time with reading texts i decided to do this little project of mine to remind myself a bit of what i've been taught couple years ago.

Thanks in advance for any help.

Answer 1

As others have pointed out, you need to change your string wanted to a list. I just hardcoded a list, but you could do use str.split(" ") if you were passed a string in a function. I also implemented you the frequency counter. Just as a note, make sure you close your files; it's also easier (and recommended) that you use the open directive.

from collections import Counter
import re
wanted = ["whereby", "also", "thus"]
cnt = Counter()
with open('C:/Users/user/desktop/text.txt', 'r') as fp:
    fp_contents = fp.read().lower()
words = re.findall('\w+', fp_contents)
for word in words:
    if word in wanted:
        cnt[word] += 1
print (cnt)
total_cnt = sum(cnt.values())
print(float(total_cnt)/len(cnt))
READ ALSO
Object created with pyyaml doesn't have some attributes [duplicate]

Object created with pyyaml doesn't have some attributes [duplicate]

This question already has an answer here:

143
Video tchat with opencv via tcp

Video tchat with opencv via tcp

I'm trying to send video captured by webcam from a client to a server which should display the videoUnfortunately, my server get the data (I think) but doesn't display it correctly and I don't unerstand why

301
Difficulty converting str data to int/float/decimal for plotting with matplolib

Difficulty converting str data to int/float/decimal for plotting with matplolib

I have extracted data from a SQL database table but have been having persistent issues with trying to plot a graph between two variablesThis is due to conversion issue between data types

234
how to search particular value in multiple logfiles from server in python

how to search particular value in multiple logfiles from server in python

I have a scenario where i need to submit XML request and tracking id will be generated in the XML responseNow i need to login to server and need to search all the logfiles in the server to check in which logfile this tracking id is present

166