How to filter list lines of a dictionary of lists according to a specific number?

35
August 10, 2018, at 2:20 PM

I have a huge dictionary of lists that needs filtering. Here's an example of its output:

d = {
    'hate': [(2310, "Experiencer: 'like hours'", 212, 222),
              (2310, "Experiencer: 'two'", 1035, 1038),
              (2310, "Experiencer: 'Anakin'", 1560, 1566),
              (2310, "Experiencer: ' '", 1619, 1620),
              (2310, "Experiencer: 'Tatooine'", 1726, 1734),
              (2310, "Experiencer: 'Anakin'", 1775, 1781),
              (2310, "Experiencer: 'Master Qui-Gon'", 1863, 1877),
              (2310, "Experiencer: 'half'", 1883, 1887),
              (2310, "Experiencer: 'One'", 2114, 2117),
              (2310, "Experiencer: 'Anakin'", 2180, 2186),
              (2310, "Stimulus: 'One'", 2484, 2487),
              (2310, "Stimulus: 'Anakin'", 2564, 2570),
              (2310, "Stimulus: 'Padme'", 2739, 2744)],
'confirmation': [(4132, "Experiencer: 'like hours'", 212, 222),
              (4132, "Experiencer: 'two'", 1035, 1038),
              (4132, "Experiencer: 'Anakin'", 1560, 1566),
              (4132, "Experiencer: ' '", 1619, 1620),
              (4132, "Experiencer: 'Tatooine'", 1726, 1734),
              (4132, "Experiencer: 'Anakin'", 1775, 1781),
              (4132, "Experiencer: 'Master Qui-Gon'", 1863, 1877),
              (4132, "Experiencer: 'half'", 1883, 1887),
              (4132, "Experiencer: 'One'", 2114, 2117),
              (4132, "Experiencer: 'Anakin'", 2180, 2186),
              (4132, "Experiencer: 'One'", 2484, 2487),
              (4132, "Experiencer: 'Anakin'", 2564, 2570),
              (4132, "Experiencer: 'Padme'", 2739, 2744),
              (4132, "Experiencer: 'Anakin'", 2782, 2788),
              (4132, "Experiencer: ' '", 2818, 2819),
              (4132, "Experiencer: 'centuries'", 3562, 3571),
              (4132, "Experiencer: 'one'", 3585, 3588),
              (4132, "Experiencer: 'Anakin'", 3679, 3685),
              (4132, "Experiencer: 'Anakin Skywalker'", 3789, 3805),
              (4132, "Experiencer: 'Obi-Wan'", 4014, 4021),
              (4132, "Experiencer: 'Qui-Gon'", 4025, 4032),
              (4132, "Experiencer: 'Qui-Gon's'", 4100, 4109),
              (4132, "Stimulus: 'Anakin'", 4281, 4287),
              (4132, "Stimulus: ' '", 4355, 4356),
              (4132, "Stimulus: 'Anakin'", 4436, 4442)]}

Each key (one of them is hate as stated above) has a number at the beginning of every list element. Here, it's: 2310.

I would like to be able to print out the two elements of the list that have a number that is closest to that number, being the next biggest, and the next smallest.

Example output:

'hate': [(2310, "Experiencer: 'Anakin'", 2180, 2186),
         (2310, "Stimulus: 'One'", 2484, 2487)]

because

(2310, "Experiencer: 'Anakin'", 2180, 2186)

has the number 2180, which is the next smallest one when compared to 2310

and in return:

(2310, "Stimulus: 'One'", 2484, 2487)

has the number 2484, which is the next biggest one when compared to 2310

I guess this needs a for loop? How do I iterarte over the dictionary of lists, compare the first, self-repeating number with the first numbers of every line and return the ones closest, as mentioned above?

I hope my question is understandable enough...

Thanks in advance!

EDIT:

Goal would be to automate the process of going through the dictionary, and update it by filtering it.

The desired output of that dictionary would be something like this:

 d = {
        'hate': [(2310, "Experiencer: 'Anakin'", 2180, 2186),
                 (2310, "Stimulus: 'One'", 2484, 2487)],
        'confirmation': [(4132, "Experiencer: 'Qui-Gon's'", 4100, 4109),    
                         (4132, "Stimulus: 'Anakin'", 4281, 4287)],
...}

I also edited the above example of an output that I'm getting so far. It's a dictionary of lists

Answer 1

If your lists are already sorted, we can use bisect to find the place between the "Experiencer" and "Status" entries:

from bisect import bisect
l=[(2310, "Experiencer: 'like hours'", 212, 222), (2310, "Experiencer: 'two'", 1035,1038), (2310, "Experiencer: 'Anakin'", 1560, 1566), (2310, "Experiencer: ' '", 1619, 1620), (2310, "Experiencer: 'Tatooine'", 1726, 1734), (2310, "Experiencer: 'Anakin'", 1775, 1781), (2310, "Experiencer: 'Master Qui-Gon'", 1863, 1877), (2310, "Experiencer: 'half'", 1883, 1887), (2310, "Experiencer: 'One'", 2114, 2117), (2310, "Experiencer: 'Anakin'", 2180, 2186), (2310, "Stimulus: 'One'", 2484, 2487), (2310, "Stimulus: 'Anakin'", 2564, 2570), (2310, "Stimulus: 'Padme'", 2739, 2744)]
right_index = bisect(l, (2310, "F"))  # "F" comes between "Experiencer" and "Status" 
lower, higher = l[right_index-1], l[right_index]
print(lower, higher, sep="\n")
# (2310, "Experiencer: 'Anakin'", 2180, 2186)
# (2310, "Stimulus: 'One'", 2484, 2487)

Then you can process your dictionary quite easily

from bisect import bisect
def get_boundary(l):  # This assumes all lists in your dict have at least 2 items
    if len(l) < 2:
        return l
    right_index = bisect(l, (l[0][0], "F"))  
    return [l[right_index-1], l[right_index]]
print({key: get_boundary(value) for key, value in d.items()})

produces

{'hate': [(2310, "Experiencer: 'Anakin'", 2180, 2186), 
          (2310, "Stimulus: 'One'", 2484, 2487)], 
 'confirmation': [(4132, "Experiencer: 'Qui-Gon's'", 4100, 4109), 
                  (4132, "Stimulus: 'Anakin'", 4281, 4287)]
}
Answer 2

Use itertools.groupby to group all like elements from each of the list and then sort them (based on absolute difference) and get the first 2 elements

>>> from itertools import groupby
>>> 
>>> f = lambda t: t[0]
>>> {key:sorted(v, key=lambda t: abs(k-t[3]))[:2] for key,lst in d.items() for k,v in groupby(sorted(lst, key=f), f)}
{'confirmation': [(4132, "Experiencer: 'Qui-Gon's'", 4100, 4109), (4132, "Experiencer: 'Qui-Gon'", 4025, 4032)], 'hate': [(2310, "Experiencer: 'Anakin'", 2180, 2186), (2310, "Stimulus: 'One'", 2484, 2487)]}
Answer 3

This is the most vanilla, simple way to do it (that I could think of). There is a solution below that uses itertools that is more elegant, but harder for a novice to understand.

If l is the list pointed to by hate in your dictionary:

target_num = l[0][0]
closest_smaller, closest_bigger = 0,0
closest_smaller_diff, closest_bigger_diff = float("inf"), float("-inf")
for element in l:
    for num in (l[-2],l[-1]):
        diff = target_num - num
        if diff > 0 and diff < closest_smaller_diff:
            closest_smaller = num
            closest_smaller_diff = diff
        if diff < 0 and diff > closest_bigger_diff:
            closest_bigger = num
            closest_bigger_diff = diff
print(closest_smaller, closest_bigger)
Answer 4
# let big_dict be the big list you start with
output_dict = {}
for key, value in big_dict.items(): 
    # break the list into two lists, for those with third value greater
    # and those with third value lesser/equal
    higher_tuples = [i for i in value if i[2] > i[0]]
    lower_tuples = [i for i in value if i[2] <= i[0]]
    # Get the values from that list with 
    high_closest = min(higher_tuples, key=lambda x: x[2] - x[0])
    low_closest = min(lower_tuples, key=lambda x: x[0] - x[2])
    # bind them into an output
    output_duct[key] = [high_closest, low_closest]

If you wanted you could bind it all together in one really big one-liner:

output_dict = {key: [min([i for i in value if i[2] > i[0]], key=lambda a: a[2] - a[0]), min([j for j in value if j[0] <= j[2]], key=lambda b: b[0] - b[2])] for key, value in big_dict.items()}
Answer 5

Perhaps the easiest way is

  • Sort the list on the element of interest, position 2.
  • Use a binary search to find where 2310 would appear in this list; you should be between two elements. Those are the two elements you want.
Answer 6

Here's a rather broad solution that might work for you:

For key in your_dict:
    for lst in your_dict[key]:
        #For every list in your dictionary,
        best_fits = []
        for item in lst:
            #For every item in that list,
            #If the item is a good fit, store it in best_fit. 
            first_number = tup[0] #Get the 0th element of the tuple
            #The rest is up to you
READ ALSO
Running scipy curve_fit produces runtime error maxfev = 1000

Running scipy curve_fit produces runtime error maxfev = 1000

I am running a curve fit in python that encountered the error, RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 1000Here is my code:

29
Rasterize Geometry Example not working

Rasterize Geometry Example not working

I am trying to recreate this rasterio example:

38
Assiging a rank to each group in pandas

Assiging a rank to each group in pandas

I have a dataframe and would like to assign a rank to each row in a groupFor example,

54