Selecting certain XML tags with criterion matching other tags

225
August 26, 2017, at 08:00 AM

I have an XML file with a structure like the following:

<text>
  <dialogue>
     <pattern>
        We're having a {nice|great} time.
     </pattern>
     <criterion>
       <!-- match this tag, get the above pattern -->
        average_person, tourist, delighted
     </criterion>
  </dialogue>
     <pattern>
        The service {here stinks|is terrible}!
     </pattern>
     <criterion>
        tourist, disgruntled, average_person
     </criterion>
  <dialogue>
     <pattern>
        They have {smoothies|funny hats}. Neat!
     </pattern>
     <criterion>
        tourist, smoothie_enthusiast
     </criterion>
  </dialogue>
  <dialogue>
     <pattern>
        I wonder how {expensive|valuable} these resort tickets are?
     </pattern>
     <criterion>
        merchant, average_person
     </criterion>
  </dialogue>
</text>

What I would like to do is go through the dialogue tags, look at the criterion tag, and match a list of words. If they match, I would then like to use the pattern in that dialogue tag. I'm using Python for this task.

What I'm currently doing is walking through the tags by utilizing an lxml "etree" which looks like this:

tree = etree.parse('tourists.xml')
root = tree.getroot()
g=0
for i in root.iterfind('dialogue/criterion'):
   a = i.text.split(',')
   # The "personality" variable has a value like "delighted" or "disgruntled".
   # "tags_to_match" are the criterion that we want to, well, match. It may
   # have criterion like "merchant", "tourist", or "delighted".
   # When the tags match (in the "match_tags" function) returns true, it
   # appends the pattern to the "tourist_patterns" list.
   if personality is not 'average_person' and match_tags( tags_to_match, a):
       tourist_patterns.append(root[g][0].text)
   g+=1
# When we don't have a match, we just go with the "average_person" tag.
if len(tourist_patterns) == 0:
   # Go through the tags again, choosing the ones that match the
   # 'average_person' personality and put it in the "tourist_patterns" list.

I then go through the elements in the "tourist_patterns" list and pluck out what I want.

I'm trying to simplify this. How can I go through the tags, match the text I want in the criterion tags, and then take the pattern in the pattern tags? I've also been trying to set a default for when the criterion isn't matched (hence the "average_person" personality criterion).

Edit: Some commentators asked for the list of what to match. Basically, I would want it to match some or all of the words in the criterion tags, and it would give the text in the pattern tag underneath that dialogue tag. So if I was looking for "tourist" and "smoothie_enthusiast", it would get one match in my XML example. I would then like to get the pattern tag text "They have {smoothies|funny hats}. Neat!". If that fails to match any of the words in criterion tags, I would just try to match "average_person" and "tourist".

In turn, tourist_patterns would look like this when it matches:

>>> tourist_pattern
    ['They have {smoothies|funny hats}. Neat!']

And when it doesn't match, it would match this:

>>> tourist_pattern
    ['They have {smoothies|funny hats}. Neat!', 'The service {here stinks|is terrible}!']

Hope that clears things up.

Rent Charter Buses Company
READ ALSO
pandas dataframe: perform calculations on columns

pandas dataframe: perform calculations on columns

New to pandas and new to stackoverflow (really), any suggestions are highly appreciated!

394
sorting in pandas not giving expected results

sorting in pandas not giving expected results

Basic basic question but can't get it rightI'm trying to sort by scores and then get the top name associated with the top score

292
ConnectionDoesNotExist at django 1.10.5 project multiple data base

ConnectionDoesNotExist at django 1.10.5 project multiple data base

I am trying to connect with a mysql database from which i need to take some information and store in a postgres databaseBoth connection are configured in the database settings

461