2

Here is a text, I need to parse;

JAVA_OPTS=blablalba
lbalbalba
1. main1:
 aelo1 2020年06月15日 11 4422
 sddg2 2020年06月12日 19 422
2. main2:
 fdata3 2020年06月15日 11 4422
 gcontent4 2020年06月12日 19 422
3. main3:
 hxvnt5 2020年06月15日 11 4422
 vcfdet6 2020年06月12日 19 422

I need to only parse the numbered bullet point, until next bullet point. and find the 4 th column greater than 1000 and older than 12 hours (2nd column date time) then send the details in email. I tried parsing via re library in python, but cannot achieve it.

So the expected output is;

 1. main1:
 aelo1 2020年06月15日 11 4422
 2. main2:
 fdata3 2020年06月15日 11 4422
 3. main3:
 hxvnt5 2020年06月15日 11 4422

is it possible via bash or python ?

3
  • What do you mean "older than 12h"? Commented Jun 17, 2020 at 4:16
  • The "older than 12 hours" requirement needs clarification - do you want to keep the rows with 3rd column values > 12 or ignore them? Also, sharing what you have tried will help others help you. Commented Jun 17, 2020 at 4:34
  • Add parse to you post tag Commented Jun 17, 2020 at 5:55

3 Answers 3

1

Here is the regex which you can use to match (I am not sure about 12 hours).

\d+\.\s\S+\s+\S+\s[0-9-]+\s\d+\s[1-9][0-9]{3,}
answered Jun 17, 2020 at 4:20
Sign up to request clarification or add additional context in comments.

Comments

0

Here a solution for you

def parsing(text):
 if text.strip() == '':
 return ''
 lines = text.split('\n')
 buffer = ''
 for line in lines:
 t = line.strip()
 if t == '' or t[0] in '0123456789':
 buffer += line + '\n'
 else:
 lst = t.split()
 if len(lst) >= 4:
 if (len(lst[1].split('-'))==3 and int(lst[2]) <= 12 and
 int(lst[3]) > 1000):
 buffer += line + '\n'
 return buffer.strip()
print(parsing(text))
answered Jun 17, 2020 at 5:41

2 Comments

That's why requested to provide more information about their requirements and situations. Updated.
Updated the expected output for better understanding
0

Can use TTP to parse/filter it in one template:

from ttp import ttp
import pprint
data = """
JAVA_OPTS=blablalba
lbalbalba
1. main1:
 aelo1 2020年06月15日 11 4001
 sddg2 2020年06月12日 19 422
2. main2:
 fdata3 2020年06月16日 11 4422
 gcontent4 2020年06月12日 19 422
3. main3:
 hxvnt5 2020年06月17日 11 4002
 vcfdet6 2020年06月12日 19 422
"""
 
template = """
<group contains="value">
1. main1: {{ _start_ }}
 {{ ignore }} {{ date }} {{ hour | lessthan("12") }} {{ value | greaterthan("4000") }}
</group> 
"""
 
parser = ttp(data, template)
parser.parse()
res = parser.result()
pprint.pprint(res)
# prints:
# [[[{'date': '2020-06-15', 'hour': '11', 'value': '4001'},
# {'date': '2020-06-16', 'hour': '11', 'value': '4422'},
# {'date': '2020-06-17', 'hour': '11', 'value': '4002'}]]]

Can test templates online here if you'd like.

Disclaimer: I am the author of TTP.

Edit: after parsing can further post-process results to compose email report or whatever the end result must look like.

answered Dec 30, 2021 at 12:38

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.