Reading a dictionary from within a dictionary

March 19, 2018, at 01:01 AM

I have a json file for tweet data. The data that I want to look at is the text of the tweet. For some reason, some of the tweets are too long to put into the normal text part of the dictionary.

It seems like there is a dictionary within another dictionary and I can't figure out how to access it very well.

Basically, what I want in the end is one column of a data frame that will have all of the text from each individual tweet. Here is a link to a small sample of the data that contains a problem tweet.

Here is the code I have so far:

import json
import pandas as pd
tweets = []
#This writes the json file so that I can work with it.  This part works correctly.
with open("filelocation.txt") as source
    for line in source:
        if line.strip():
df = pd.DataFrame.from_dict(tweets)

When looking at the info you can see that there will be a column called extended_tweet that only encompasses one of the two sample tweets. Within this column, there seems to be another dictionary with one of those keys being full_text.

I want to add another column to the dataframe that just has this information along with the normal text column when the full_text is null.

My first thought was to try and read that specific column of the dataframe as a dictionary again using:

d = pd.DataFrame.from_dict(tweets['extended_tweet]['full_text])

But this doesn't work. I don't really understand why that doesn't work as that is how I read the data the first time.

My guess is that I can't look at the specific names because I am going back to the list and it would have to read all or none. The error it gives me says "KeyError: 'full_text' "

I also tried using the recommendation provided by this website. But this gave me a None value no matter what.

Thanks in advance!

Answer 1

I would suggest to flatten out the dictionaries like this:

tweet = json.loads(line)
tweet['full_text'] = tweet['extended_tweet']['full_text']
VS Code Python unittest “No tests ran”

VS Code Python unittest “No tests ran”

I'm setting up python unit tests in VS Code using unittest, and running into some issues

Get the top two elements in a nested list - pyspark

Get the top two elements in a nested list - pyspark

Let's say I have a list L=[(a,2),(a,3),(a,4),(b,4),(b,8),(b,9)] Using pyspark I want to be able to remove the third element so that it will look like this:

Reddit Bot Stops commenting after certain time

Reddit Bot Stops commenting after certain time

I have been trying to make a bot that searches for a specific keyword in the reddit title, if that keyword is true it would then comment something in that threadEverything works find, just I have one problem, after around 4 hours of it running it keeps searching...

Simple Input doesn't work in Jupiter Notebook

Simple Input doesn't work in Jupiter Notebook

I have a strange problemSame code seems to be working fine when I use shell and doesn't work in Jupiter Notebook