Find the number of occurrences of each word in a sentence in each target category

187
February 13, 2018, at 11:54 AM

I have got something like this.

Sentence                                        Target
We regret to inform you about the result.        1
We are glad to inform you about the result.      2
We would like to inform you about the result.   3
We are surprised to see the result.              4

I want a word count that looks something like this

Word    Target 1    Target 2    Target 2    Target 4
Result     1           1            1           1
Inform     1           1            1           0
Surprised   0           0           0           1

... and so on. How do I do this?

Answer 1

You'll need to

  1. remove punctuation and lowercase the data
  2. split on whitespace
  3. stack to create a series
  4. groupby on Target
  5. find the value_counts of words for each target
  6. unstack the result for your desired output

df.Sentence.str.replace('[^\w\s]', '')\
  .str.lower()\
  .str.split(expand=True)\
  .set_index(df.Target)\
  .stack()\
  .groupby(level=0)\
  .value_counts()\
  .unstack(0, fill_value=0)\
  .add_prefix('Target ')

Target     Target 1  Target 2  Target 3  Target 4
about             1         1         1         0
are               0         1         0         1
glad              0         1         0         0
inform            1         1         1         0
like              0         0         1         0
regret            1         0         0         0
result            1         1         1         1
see               0         0         0         1
surprised         0         0         0         1
the               1         1         1         1
to                1         1         1         1
we                1         1         1         1
would             0         0         1         0
you               1         1         1         0
READ ALSO
Create corpus based on word representation Gensim Python

Create corpus based on word representation Gensim Python

I want to train a ldamodel using gensim, i already have a variable that contain a list of list of existing text like some_var = [[0, 0, 1], [1, 0, 1]], I want to create a gensim corpus based on that representation to train the ldamodel

189
OpenCV Extract open edges from image after using Canny algorithm

OpenCV Extract open edges from image after using Canny algorithm

I'm trying to extract the edge of a drop from the following image which I've first applied cv2Canny() onto:

325
Display Python output in Ansible [duplicate]

Display Python output in Ansible [duplicate]

This question already has an answer here:

268
how to parse a string date to a proper date format?

how to parse a string date to a proper date format?

I am using a python "datefinder" library to obtain a "datetime" objectHowever, i encountered some issues where i have a string: "the 6m day of June 2012" and the "datefinder" library could not generate the correct date of that string

269