parse multilevel json to string with condition

62
April 15, 2017, at 04:57 AM

I have this nested json item that I just want to flatten out to a comma separated string (i.e. parkinson:5, billy mays:4)so I can store in a database if needed for future analysis. I wrote out the function below but am wondering if there's a more elegant way using list comprehension (or something else). I found this post but I'm not sure how to adapt it for my needs (Python - parse JSON values by multilevel keys).

Data looks like this:

{'persons':
     [{'name': 'parkinson', 'sentiment': '5'},
      {'name': 'knott david', 'sentiment': 'none'},
      {'name': 'billy mays', 'sentiment': '4'}],
 'organizations':
      [{'name': 'piper jaffray companies', 'sentiment': 'none'},
       {'name': 'marketbeat.com', 'sentiment': 'none'},
       {'name': 'zacks investment research', 'sentiment': 'none'}]
 'locations': []
}

Here's my code:

def parse_entities(data):
    results = ''
    for category in data.keys():
    # for c_id, category in enumerate(data.keys()):
        entity_data = data[category]
        for e_id, entity in enumerate(entity_data):
            if not entity_data[e_id]['sentiment'] == 'none':
                results = results + (data[category][e_id]['name'] + ":" +
                                     data[category][e_id]['sentiment'] + ",")
    return results
Answer 1

Firstly, the most important thing to make your code shorter and nicer to look at is to use your own variables. Be aware that entity_data = data[category] and entity = entity_data[e_id]. So you can write entity['name'] instead of data[category][e_id]['name'].

Secondly, if you want something like

for category in data.keys():
    entity_data = data[category]

you can make it shorter and easier to read by changing it to

for category, entity_data in data.items():

But you don't even need that here, you can just use the data.values() iterator to get the values. When combining these improvements your code looks like this:

def parse_entities(data):
    results = ''
    for entity_data in data.values():
        for entity in entity_data:
            if entity['sentiment'] != 'none':
                results += entity['name'] + ":" + entity['sentiment'] + ","
    return results

(I have also changed results = results + ... to results += ... and if not entity['sentiment'] == 'none' to if entity['sentiment'] != 'none', because it is shorter and doesn't lower the readability)

When you have this it is much easier to make it even shorter and more elegant by using list comprehension:

def parse_entities(data):
    return ",".join([entity['name'] + ":" + entity['sentiment']
                     for entity_data in data.values()
                     for entity in entity_data
                     if not entity['sentiment'] == 'none'])
Answer 2

Maybe something like this will work?

def parse_entities(data):
    results = []
    for category in data.keys():
        results += list(map(lambda x: '{0}:{1}'.format(x['name'], x['sentiment']),
                            filter(lambda i: i['sentiment'] != 'none', data[category])))
    return ','.join(results)
if __name__ == '__main__':
    print(parse_entities(data))

With the output looking like this

parkinson:5,billy mays:4
Answer 3

This might be a way to do it. Even though using a 'proper library' (depending on your actual use case) makes more sense.

data = {
 'persons':
     [{'name': 'parkinson', 'sentiment': '5'},
      {'name': 'knott david', 'sentiment': 'none'},
      {'name': 'billy mays', 'sentiment': '4'}],
 'organizations':
      [{'name': 'piper jaffray companies', 'sentiment': 'none'},
       {'name': 'marketbeat.com', 'sentiment': 'none'},
       {'name': 'zacks investment research', 'sentiment': 'none'}],
 'locations': []
}
import itertools
# eq. = itertools.chain.from_iterable(data.values())
dicts = itertools.chain(*data.values())
pairs = [":".join([d['name'], d['sentiment']])
         for d in dicts if d['sentiment'] != 'none']
result = ",".join(pairs)
print(result)
# parkinson:5,billy mays:4
# short, but less readable version
result = ",".join([":".join([d['name'], d['sentiment']])
                   for d in itertools.chain(*data.values())
                   if d['sentiment'] != 'none'])
Answer 4

This is a problem where we need to perform the 3 separate tasks:

  1. Filter out unqualified rows of data
  2. Flatten the dict of lists into a simple list
  3. Transform each dictionary object into a simple tuple, ready for formatting

Here is the code:

def parse_entities(data):
    new_data = [
        (row['name'], row['sentiment'])        # 3. Transform
        for rows in data.values()              # 2. Flatten
            for row in rows                    # 2. Flatten
                if row['sentiment'] != 'none'  # 1. Filter
    ]
    # e.g, new_data = [('parkinson', '5'), ('billy mays', '4')]
    return ','.join('{}:{}'.format(*row) for row in new_data)
#
# test code
#
data = {
    'locations': [],
    'organizations': [
        {'name': 'piper jaffray companies', 'sentiment': 'none'},
        {'name': 'marketbeat.com', 'sentiment': 'none'},
        {'name': 'zacks investment research', 'sentiment': 'none'}
    ],
    'persons': [
        {'name': 'parkinson', 'sentiment': '5'},
        {'name': 'knott david', 'sentiment': 'none'},
        {'name': 'billy mays', 'sentiment': '4'}
    ],
}
print parse_entities(data)

Output:

parkinson:5,billy mays:4
Answer 5

Here's a generator expression that does it:

data = {'persons': [
            {'name': 'parkinson', 'sentiment': '5'},
            {'name': 'knott david', 'sentiment': 'none'},
            {'name': 'billy mays', 'sentiment': '4'}],
        'organizations': [
            {'name': 'piper jaffray companies', 'sentiment': 'none'},
            {'name': 'marketbeat.com', 'sentiment': '99'},
            {'name': 'zacks investment research', 'sentiment': 'none'}],
        'locations': []
}
results = ','.join(entity['name'] + ':' + entity['sentiment']
                    for category, entity_data in data.items()
                        for entity in entity_data if entity['sentiment'] is not 'none')

print(results)  # -> parkinson:5,billy mays:4,marketbeat.com:99

Note: I changed the sample data slightly to make sure it handled data in more than one category the same as your code.

READ ALSO
Proper way to submit cookies :KeyError: “name='session.ID', domain=None, path=None”

Proper way to submit cookies :KeyError: “name='session.ID', domain=None, path=None”

When submitting cookies to a lot of sites this error is common and the documentation isn't clear

36
How can i use word2vec model ? if my data is stored on cloud?

How can i use word2vec model ? if my data is stored on cloud?

I am using word2vec model for semantic analysisSay, After the training i uploaded my binary or vector file on cloud

55
How do I imitate the behaviour of a pressed button in Android?

How do I imitate the behaviour of a pressed button in Android?

I'm trying to make a custom button, that behaves just like a 'pressed button' in Android with my own background images

54
How to listen for any cookie changes in Android Webview

How to listen for any cookie changes in Android Webview

My application using a webview to open a webpage which is updating few cookies on some interactions taken on webpageNow I want to read that cookie and make some UI tweaks based on the value of the cookie

38