How to groupby a column and count the number of unique values in another column

110
April 10, 2022, at 6:40 PM

I have the following dataframe. I need to groupby the ngram, and for each group, count how many unique documents are present in the DocID column.

For example, from the above

4-gram group - 4 as number of unique documents (doc64,doc383,doc76,doc370)
5-gram - 4 
6-gram - 4
7-gram - 2
8-gram - 2

I have an idea in bits. I can get the unique DocIDs as follows:

#Get all the docs of repeated summaries in one list as a list of lists.
rep = []
rep += temp['DocID'].str.split(",").tolist()
# Put all values in one list.
repSet = []
for i in range(len(rep)):
    repSet.extend(rep[i])
# Remove all duplicates and store in a list.
repSet = list(set(repSet))

But I don't know how to merge this with groupby.

EDIT

I have added the output from the first answer provided. Thank you! But the total number of documents are only 461. So I believe the maximum value of the DocID can go up to only that much :( but for the trigram its above 461 :(

Help will be greatly appreciated. Thanks!

Answer 1

Maybe something like this?

df.assign(docid=df['docid'].str.split(',')).explode('docid').groupby('ngram')['docid'].nunique().reset_index()
Rent Charter Buses Company
READ ALSO
How to replace pixel data in same DICOM file using pyDicom to read it again with any DICOM viewer?

How to replace pixel data in same DICOM file using pyDicom to read it again with any DICOM viewer?

I want to read some DICOM files, so I'm testing pydicom for my work, which I think is considerably useful

122
Selecting dropdown values using Excel(OpenPyXl) - Selenium-Python

Selecting dropdown values using Excel(OpenPyXl) - Selenium-Python

I am having a web application which is having multiple dropdownsFor inputting the values for the other text fields, I am reading the data from excel to comply with data driven testing

112
Get command prompt via "sudo su -" then run command using Paramiko exec_command

Get command prompt via "sudo su -" then run command using Paramiko exec_command

Could you help me with followingI need to login as user then sudo su - then run command ex

75