Extracting patterns after matching a regex
nn
pruebauno at latinmail.com
Tue Sep 8 11:52:42 EDT 2009
On Sep 8, 10:25 am, "Mart." <mdeka... at gmail.com> wrote:
> On Sep 8, 3:21 pm, nn <prueba... at latinmail.com> wrote:
>>>> > On Sep 8, 9:55 am, "Mart." <mdeka... at gmail.com> wrote:
>> > > On Sep 8, 2:16 pm, "Andreas Tawn" <andreas.t... at ubisoft.com> wrote:
>> > > > > Hi,
>> > > > > I need to extract a string after a matching a regular expression. For
> > > > > example I have the string...
>> > > > > s = "FTPHOST: e4ftl01u.ecs.nasa.gov"
>> > > > > and once I match "FTPHOST" I would like to extract
> > > > > "e4ftl01u.ecs.nasa.gov". I am not sure as to the best approach to the
> > > > > problem, I had been trying to match the string using something like
> > > > > this:
>> > > > > m = re.findall(r"FTPHOST", s)
>> > > > > But I couldn't then work out how to return the "e4ftl01u.ecs.nasa.gov"
> > > > > part. Perhaps I need to find the string and then split it? I had some
> > > > > help with a similar problem, but now I don't seem to be able to
> > > > > transfer that to this problem!
>> > > > > Thanks in advance for the help,
>> > > > > Martin
>> > > > No need for regex.
>> > > > s = "FTPHOST: e4ftl01u.ecs.nasa.gov"
> > > > If "FTPHOST" in s:
> > > > return s[9:]
>> > > > Cheers,
>> > > > Drea
>> > > Sorry perhaps I didn't make it clear enough, so apologies. I only
> > > presented the example s = "FTPHOST: e4ftl01u.ecs.nasa.gov" as I
> > > thought this easily encompassed the problem. The solution presented
> > > works fine for this i.e. re.search(r'FTPHOST: (.*)',s).group(1). But
> > > when I used this on the actual file I am trying to parse I realised it
> > > is slightly more complicated as this also pulls out other information,
> > > for example it prints
>> > > e4ftl01u.ecs.nasa.gov\r\n', 'FTPDIR: /PullDir/0301872638CySfQB\r\n',
> > > 'Ftp Pull Download Links: \r\n', 'ftp://e4ftl01u.ecs.nasa.gov/PullDir/
> > > 0301872638CySfQB\r\n', 'Down load ZIP file of packaged order:\r\n',
>> > > etc. So I need to find a way to stop it before the \r
>> > > slicing the string wouldn't work in this scenario as I can envisage a
> > > situation where the string lenght increases and I would prefer not to
> > > keep having to change the string.
>> > > Many thanks
>> > It is not clear from your post what the input is really like. But just
> > guessing this might work:
>> > >>> print s
>> > 'MEDIATYPE: FtpPull\r\n', 'MEDIAFORMAT: FILEFORMAT\r\n','FTPHOST:
> > e4ftl01u.ecs.nasa.gov\r\n', 'FTPDIR: /PullDir/0301872638CySfQB\r
> > \n','Ftp Pull Download Links: \r\n'
>> > >>> re.search(r'FTPHOST: (.*?)\\r',s).group(1)
>> > 'e4ftl01u.ecs.nasa.gov'
>> Hi,
>> That does work. So the \ escapes the \r, does this tell it to stop
> when it reaches the "\r"?
>> Thanks
Indeed.
More information about the Python-list
mailing list