I am trying to write a regular expression which returns a part of substring which is after a string. For example: I want to get part of substring along with spaces which resides after "15/08/2017".
a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
Is there a way to get 'AFFIDAVIT OF' and 'CASH & MTGE' as separate strings?
Here is the expression I have pieced together so far:
doc = (a.split('15/08/2017', 1)[1]).strip()
'AFFIDAVIT OF CASH & MTGE'
-
I have edited with the actual input string.User123– User1232018年12月21日 06:11:56 +00:00Commented Dec 21, 2018 at 6:11
-
Okay anyway to do this using regex?User123– User1232018年12月31日 04:15:47 +00:00Commented Dec 31, 2018 at 4:15
-
Why do you want to do this with regex? Are you willing to accept any other solution?Mad Physicist– Mad Physicist2018年12月31日 04:29:37 +00:00Commented Dec 31, 2018 at 4:29
-
Yes if there is a better way other than regexUser123– User1232018年12月31日 04:30:14 +00:00Commented Dec 31, 2018 at 4:30
11 Answers 11
Not a regex based solution. But does the trick.
a='''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
doc = (a.split('15/08/2017', 1)[1]).strip()
# used split with two white spaces instead of one to get the desired result
print(doc.split(" ")[0].strip()) # outputs AFFIDAVIT OF
print(doc.split(" ")[-1].strip()) # outputs CASH & MTGE
Hope it helps.
Comments
re based code snippet
import re
foo = '''S
LINC SHORT LEGAL TITLE NUMBER
0037 471 661 1720278;16;21 172 211 342
LEGAL DESCRIPTION
PLAN 1720278
BLOCK 16
LOT 21
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: FEE SIMPLE
ATS REFERENCE: 4;24;54;2;SW
MUNICIPALITY: CITY OF EDMONTON
REFERENCE NUMBER: 172 023 641 +71
----------------------------------------------------------------------------
----
REGISTERED OWNER(S)
REGISTRATION DATE(DMY) DOCUMENT TYPE VALUE CONSIDERATION
---------------------------------------------------------------------------
--
---
172 211 342 15/08/2017 AFFIDAVIT OF CASH & MTGE'''
pattern = '.*\d{2}/\d{2}/\d{4}\s+(\w+\s+\w+)\s+(\w+\s+.*\s+\w+)'
result = re.findall(pattern, foo, re.MULTILINE)
print "1st match: ", result[0][0]
print "2nd match: ", result[0][1]
Output
1st match: AFFIDAVIT OF
2nd match: CASH & MTGE
Comments
We can try using re.findall with the following pattern:
PHASED OF ((?!\bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)
Searching in multiline and DOTALL mode, the above pattern will match everything occurring between PHASED OF until, but not including, CONDOMINIUM PLAN.
input = "182 246 612 01/10/2018 PHASED OF CASH & MTGE\n CONDOMINIUM PLAN"
result = re.findall(r'PHASED OF (((?!\bCONDOMINIUM PLAN).)*)(?=CONDOMINIUM PLAN)', input, re.DOTALL|re.MULTILINE)
output = result[0][0].strip()
print(output)
CASH & MTGE
Note that I also strip off whitespace from the match. We might be able to modify the regex pattern to do this, but in a general solution, maybe you want to keep some of the whitespace, in certain cases.
5 Comments
It looks like you know the exact delimiting string, just str.split() by it and get the first part:
In [1]: a='172 211 342 15/08/2017 TRANSFER OF LAND 610,000ドル CASH & MTGE'
In [2]: a.split("15/08/2017", 1)[0]
Out[2]: '172 211 342 '
I would avoid using regex here, because the only meaningful separation between the logical terms appears to be 2 or more spaces. Individual terms, including the one you want to match, may also have spaces. So, I recommend doing a regex split on the input using \s{2,} as the pattern. These will yield a list containing all the terms. Then, we can just walk down the list once, and when we find the forward looking term, we can return the previous term in the list.
import re
a = "172 211 342 15/08/2017 TRANSFER OF LAND 610,000ドル CASH & MTGE"
parts = re.compile("\s{2,}").split(a)
print(parts)
for i in range(1, len(parts)):
if (parts[i] == "15/08/2017"):
print(parts[i-1])
['172 211 342', '15/08/2017', 'TRANSFER OF LAND', '610,000ドル', 'CASH & MTGE']
172 211 342
Comments
positive lookbehind assertion**
m=re.search('(?<=15/08/2017).*', a)
m.group(0)
Comments
You have to return the right group:
re.match("(.*?)15/08/2017",a).group(1)
Comments
You nede to use group(1)
import re
re.match("(.*?)15/08/2017",a).group(1)
Output
'172 211 342 '
Comments
Building on your expression, this is what I believe you need:
import re
a='172 211 342 15/08/2017 TRANSFER OF LAND 610,000ドル CASH & MTGE'
re.match("(.*?)(\w+/)",a).group(1)
Output:
'172 211 342 '
Comments
You can do this by using group(1)
re.match("(.*?)15/08/2017",a).group(1)
UPDATE
For updated string you can use .search instead of .match
re.search("(.*?)15\/08\/2017",a).group(1)
3 Comments
15/08/2017.Your problem is that your string is formatted the way it is. The line you are looking for is
182 246 612 01/10/2018 PHASED OF CASH & MTGE
And then you are looking for what ever comes after 'PHASED OF' and some spaces.
You want to search for
(?<=PHASED OF)\s*(?P.*?)\n
in your string. This will return a match object containing the value you are looking for in the group value.
m = re.search(r'(?<=PHASED OF)\s*(?P<your_text>.*?)\n', a)
your_desired_text = m.group('your_text')
Also: There are many good online regex testers to fiddle around with your regexes. And only after finishing up the regex just copy and paste it into python.
I use this one: https://regex101.com/