I have a string as
sg_ts_feature_name_01_some_xyz
In this, i want to extract two words that comes after the pattern - sg_ts with the underscore seperation between them
It must be,
feature_name
This regex,
st = 'sg_ts_my_feature_01'
a = re.match('sg_ts_([a-zA-Z_]*)_*', st)
print a.group()
returns,
sg_ts_my_feature_
whereas, i expect,
my_feature
-
Have a look at this demo.Wiktor Stribiżew– Wiktor Stribiżew2015年09月26日 09:19:05 +00:00Commented Sep 26, 2015 at 9:19
-
stribizhev is too humble to put his best answer as just a comment and leave without traces....user2879704– user28797042015年09月26日 09:24:14 +00:00Commented Sep 26, 2015 at 9:24
-
No, I just was looking after my 2 children, I have no time to write a full answer. Glad you could solve your issue with others' help. Have a great weekend.Wiktor Stribiżew– Wiktor Stribiżew2015年09月26日 09:59:25 +00:00Commented Sep 26, 2015 at 9:59
2 Answers 2
The problem is that you are asking for the whole match, not just the capture group. From the manual:
group([group1, ...]) Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string; if it is in the inclusive range [1..99], it is the string matching the corresponding parenthesized group.
and you asked for a.group() which is equivalent to a.group(0) which is the whole match. Asking for a.group(1) will give you only the capture group in the parentheses.
Comments
You can ask for the group surrounded by the parentheses, 'a.group(1)', which returns
'my_feature_'
In addition, if your string is always in this form you could also use the end-of string character $ and to make the inner match lazy instead of greedy (so it doesn't swallow the _).
a = re.match('sg_ts_([a-zA-Z_]*?)[_0-9]*$',st)