Remove everything but #number in brackets

105
November 28, 2019, at 5:30 PM

I have a file where the lines have the form #nr = name(#nr, (#nr), different vars, and names).

I would like to only have the #nr in the brackets to get the form #nr = name(#nr, #nr)

I have tried to solve this in different ways like using regex, startswith() and lists but nothing has worked so far.

Any help is much appreciated.

Edit: Code


for line in f.split():
    start = line.find( '(' )
    end = line.find( ')' )
    if start != -1 and end != -1:
        line = ''.join(i for i in x if not i.startswith('#'))
    print(line)

Edit 2: As example I have:

#304= IFCRELDEFINESBYPROPERTIES('0FZ0hKNanFNAQpJ_Iqh4zM',#42,$,$,(#142),#301);

Afterwards I want to have:

#304= IFCRELDEFINESBYPROPERTIES(#42,#142,#301);
Answer 1

This can be solved using regex, though trying to do it with a single find/replace would be more complicated. Instead, you can do it in two steps:

import re
def sub_func(match):
    nums = re.findall(r'#\d+', match.group(2))
    return match.group(1) + '(' + ','.join(nums) + ');'
text = "#304= IFCRELDEFINESBYPROPERTIES('0FZ0hKNanFNAQpJ_Iqh4zM',#42,$,$,(#142),#301);"
result = re.sub(r'(^[^(]+)\((.*)\);', sub_func, text)
print(result)
# '#304= IFCRELDEFINESBYPROPERTIES(#42,#142,#301);'

So instead of passing a string as the second argument for re.sub, we pass a function instead, where we can process the results of the match with some more regex and reformatting the results before passing it back.

Rent Charter Buses Company
READ ALSO
Difference between AMP and AMQP?

Difference between AMP and AMQP?

The Advanced Message Queuing Protocol (AMQP) and the Asynchronous Messaging Protocol (AMP), as their name already says, are protocols

143
Grouping by date and number of unique users for multiple variables

Grouping by date and number of unique users for multiple variables

I have a dataframe containing tweetsI've got columns with information about the datetime, about a unique user_id and then columns indicating if the tweet belongs to a thematic category

99
Why does scipy.griddata return nans with 'cubic' interpolation if input 'values' contains nan?

Why does scipy.griddata return nans with 'cubic' interpolation if input 'values' contains nan?

I want to perform cubic interpolation of an array that contains some nan values using scipygriddata

123