nltk stopwords removal gives the wrong output

309
August 12, 2017, at 09:57 AM

I have an issue with removing stopwords. When I execute my script:`

import nltk
from nltk.corpus import stopwords
file1=open('english.txt', 'r')
english=file1.read()
file1.close()
english_corpus_lowercase =([w.lower() for w in english])
english_without_punc=''.join([c for c in english_corpus_lowercase if c not in (",", "``", "`", "?", ".", ";", ":", "!", "''", "'", '"', "-", "(", ")")])
print(english_without_punc)
print(type(english_without_punc))
stopwords = nltk.corpus.stopwords.words('english')
print(stopwords)
english_corpus_sans_stopwords = set()
for w in english_without_punc:
    if w not in stopwords:
        english_corpus_sans_stopwords.add(w)
        print(english_corpus_sans_stopwords)

It gives me the following. How could I fix it?

{'b', 'n', 'f', 'l', 'v', 'h', 'k', 'e', 'r', ' ', 'w', '“', 'g', 'u', 'p', 'c'}
{'b', 'n', 'f', 'l', 'v', 'h', 'k', 'e', 'r', ' ', 'w', '“', 'g', 'u', 'p', 'c'}
{'b', 'n', 'f', 'l', 'v', 'h', 'k', 'e', 'r', ' ', 'w', '“', 'g', 'u', 'p', 'c'}
{'b', 'n', 'f', 'l', 'v', 'h', 'k', 'e', 'r', ' ', 'w', '“', 'g', 'u', 'p', 'c'}
{'b', 'n', 'f', 'l', 'v', 'h', 'k', 'e', 'r', ' ', 'w', '“', 'g', 'u', 'p', 'c'}
{'b', 'n', 'f', 'l', 'v', 'h', 'k', 'e', 'r', ' ', 'w', '“', 'g', 'u', 'p', 'c'}
{'b', 'n', 'f', 'l', 'v', 'h', 'k', 'e', 'r', ' ', 'w', '“', 'g', 'u', 'p', 'c'}
{'b', 'n', 'f', 'l', 'v', 'h', 'k', 'e', 'r', ' ', 'w', '“', 'g', 'u', 'p', 'c'}
{'b', 'n', 'f', 'l', 'v', 'h', 'k', 'e', 'r', ' ', 'w', '“', 'g', 'u', 'p', 'c'}
Answer 1

Try the below:

import nltk
from nltk.corpus import stopwords
from nltk import word_tokenize
file1 = open('english.txt', 'r')
english = file1.read()
file1.close()
english_corpus_lowercase = [w.lower() for w in word_tokenize(english)] 
english_without_punc = [c for c in english_corpus_lowercase if c not in (",", "``", "`", "?", ".", ";", ":", "!", "''", "'", '"', "-", "(", ")")]
english_corpus_sans_stopwords = []
stopwords = nltk.corpus.stopwords.words('english')
for w in english_without_punc:
    if w not in stopwords:
        english_corpus_sans_stopwords.append(w)
print(english_corpus_sans_stopwords)
Rent Charter Buses Company
READ ALSO
Selenium PhantomJS save screenshot not getting the correct page

Selenium PhantomJS save screenshot not getting the correct page

I have the following Python code to take screenshots of webpagesIt is working good for most cases, but when i tried to print

559
Python removing quotes

Python removing quotes

Hi I am trying to strip quotes from a stringThis is how I get the string

335
How to implement a neural network model, with fixed correspondence between the input layer and the first hidden layer specified?

How to implement a neural network model, with fixed correspondence between the input layer and the first hidden layer specified?

I would like to implement a feed-forward neural network, with the only difference from a usual one that I'd manually control the correspondence between input features and the first hidden layer neuronsFor example, in the input layer I have features f1, f2,

296