Positive and Negative Regex in Python

43
November 13, 2019, at 5:40 PM

I've been using RegEx expressions in Python for a while and I want to find out if it possible to ask RegEx to match a line that has a value in the line but also does not have a value in the same line:

Given a line saying: filename.txt opt1 opt2 opt4

I want to say find filename.txt that does not have option3.

I used: ^(?!.*opt3).* filename.txt.*

I don't like how this reads BUT seams the only technique that might work

Answer 1

Maybe,

^(?!.*opt3)filename\.txt

might simply suffice.

Demo 1

Just in case, filename.txt wouldn't be at the beginning of the string, then

^(?!.*opt3).*(filename\.txt)

would be an option to look into, as well.

Demo 2

RegEx Circuit

jex.im visualizes regular expressions:

Test 1

import re
string = '''
filename.txt  opt1 opt2 opt4
filename.txt  opt1 opt2 opt3 opt4
 filename.txt  opt1 opt2 opt4
  filename.txt  opt1 opt2 opt3 opt4
'''
expression = r'^(?!.*opt3).*(filename\.txt)'
print(re.findall(expression, string, re.M))

Output 1

['filename.txt', 'filename.txt']

Test 2

If you wanted to swipe the entire string, you can simply add a .* at the end of the expression:

import re
string = '''
filename.txt  opt1 opt2 opt4
filename.txt  opt1 opt2 opt3 opt4
 filename.txt  opt1 opt2 opt4
  filename.txt  opt1 opt2 opt3 opt4
'''
expression = r'^(?!.*opt3).*filename\.txt.*'
print(re.findall(expression, string, re.M))

Output 2

['filename.txt  opt1 opt2 opt4', ' filename.txt  opt1 opt2 opt4']

Test 3

import re
string = '''
filename.txt  opt1 opt2 opt4
filename.txt  opt1 opt2 opt3 opt4
 filename.txt  opt1 opt2 opt4
  filename.txt  opt1 opt2 opt3 opt4
'''
expression = r'(?m)^(?!.*opt3).*filename\.txt.*'
print(re.findall(expression, string))
for item in re.finditer(expression, string):
    print(item.group(0))

Output

['filename.txt  opt1 opt2 opt4', ' filename.txt  opt1 opt2 opt4']
filename.txt  opt1 opt2 opt4
 filename.txt  opt1 opt2 opt4
Answer 2

If you want a single regex, then your approach is how I would do this, but you should place word boundaries around the various terms:

^(?!.*\bopt3\b).*\bfilename.txt\b.*$

You could also do this using two separate calls to re.search, e.g.

line = "filename.txt  opt1 opt2 opt4"
if re.search(r'^.*\bfilename.txt\b.*$', line) and not re.search(r'^.*\bopt3\b.*$', line):
    print("line is a match")
Answer 3

In the code that you tried in the the comments, the first argument of re.search is the pattern and the second argument is the string.

Your code might look like

import re
match = re.search('^(?!.*opt3)FileName\.abc', 'FileName.abc 0 opt1 opt2 opt4')
if match:
    print(re.group())

See a Python demo

If the should not be opt3 in the string but opt3 can be part of a larger string, you could use lookarounds to make sure opt3 is not surrounded by non whitespace chars

^(?!.*(?<!\S)opt3(?!\S))filename\.txt
  • ^ Start of string
  • (?! Negative lookahead, assert what is on the right is not
    • .* Match any char except a newline 0+ times
    • (?<!\S)opt3(?!\S) match opt3 not surrounded by non whitespace chars
  • ) Close lookahead
  • filename\.txt match literally

Regex demo

READ ALSO
Whenever I try using tensorboard I get no module found error

Whenever I try using tensorboard I get no module found error

When I run the command tensorboard --logdir runs --hostlocalhost on my windows I get an error like this ModuleNotFoundError: No module named 'tensorboardmain'; 'tensorboard' is not a package

16
how to generate tensorflow matrix with first of few columns as 1 and the rest as 0

how to generate tensorflow matrix with first of few columns as 1 and the rest as 0

Given a list len_list of the number of ones for the vectors, eg

50
Win32COM code wont replace excel strings in a formula reference

Win32COM code wont replace excel strings in a formula reference

This code copies the first sheet of an excel workbook that already has formulas into a new bookFor some reason, when you move the sheet, the formulas still reference the old book (in the example below-- Book1)

38
Python text_content() returns bound method

Python text_content() returns bound method

Can't figure out why this is returning as bound method HtmlMixintext_content, trying to create a web-scraper and display the data in a dataframe

48