I asked similar question at: http://stackoverflow.com/questions/38410982/superfast-regexmatch-in-large-text-file
I asked similar question at: https://stackoverflow.com/questions/38410982/superfast-regexmatch-in-large-text-file
- 145.5k
- 22
- 190
- 478
.
f.txt is at: http://patents.reedtech.com/downloads/PatentMaintFeeEvents/1981-present/MaintFeeEventsf.ziptxt has ~ 14 million lines, and looks like this:
4287053 06218896 N 19801222 19810901 19881222 M171
4287053 06218896 N 19801222 19810901 19850211 M170
4289713 06222552 Y 19810105 19810915 19930330 SM02
4289713 06222552 Y 19810105 19810915 19930303 M285
4289713 06222552 Y 19810105 19810915 19921208 RMPN
4289713 06222552 Y 19810105 19810915 19921208 ASPN
4289713 06222552 Y 19810105 19810915 19881116 ASPN
4289713 06222552 Y 19810105 19810915 19881107 M171
4289713 06222552 Y 19810105 19810915 19850306 M170
4291808 06215853 N 19801212 19810929 19851031 EXP.
4291808 06215853 N 19801212 19810929 19850812 REM.
4292069 06227825 N 19810123 19810929 19930926 EXP.
4292069 06227825 N 19810123 19810929 19890323 ASPN
4292069 06227825 N 19810123 19810929 19890320 M171
4292069 06227825 N 19810123 19810929 19850314 M170
4292142 06224175 N 19810112 19810929 19930926 EXP.
4292142 06224175 N 19810112 19810929 19890316 M171
4292142 06224175 N 19810112 19810929 19861008 ASPN
4292142 06224175 N 19810112 19810929 19850925 M170
4292142 06224175 N 19810112 19810929 19850925 M176
4292142 06224175 N 19810112 19810929 19850812 REM.
...
RE45962 14454334 Y 20140807 20160405 20160323 ASPN
RE45972 14335639 N 20140718 20160412 20160512 M1551
RE45975 14464421 N 20140820 20160412 20160511 M1551
RE45975 14464421 N 20140820 20160412 20160510 ASPN
RE46021 13775962 N 20130225 20160531 20160621 M1551
RE46028 14491699 N 20140919 20160614 20160621 STOL
RE46046 13755710 N 20130131 20160628 20160624 ASPN
RE46051 10137107 N 20020502 20160705 20160624 ASPN
RE46074 14249009 N 20140617 20160719 20160614 ASPN
data.txt is at: http://hastebin.com/benacubebudata.lisptxt (if I saved it right).has ~ 200 lines, and looks like this:
6268343
6268343
6268343
6268343
6268343
6268343
7749955
8710181
6268343
6384016
6458924
...
.
f.txt is at: http://patents.reedtech.com/downloads/PatentMaintFeeEvents/1981-present/MaintFeeEvents.zip
data.txt is at: http://hastebin.com/benacubebu.lisp (if I saved it right).
f.txt has ~ 14 million lines, and looks like this:
4287053 06218896 N 19801222 19810901 19881222 M171
4287053 06218896 N 19801222 19810901 19850211 M170
4289713 06222552 Y 19810105 19810915 19930330 SM02
4289713 06222552 Y 19810105 19810915 19930303 M285
4289713 06222552 Y 19810105 19810915 19921208 RMPN
4289713 06222552 Y 19810105 19810915 19921208 ASPN
4289713 06222552 Y 19810105 19810915 19881116 ASPN
4289713 06222552 Y 19810105 19810915 19881107 M171
4289713 06222552 Y 19810105 19810915 19850306 M170
4291808 06215853 N 19801212 19810929 19851031 EXP.
4291808 06215853 N 19801212 19810929 19850812 REM.
4292069 06227825 N 19810123 19810929 19930926 EXP.
4292069 06227825 N 19810123 19810929 19890323 ASPN
4292069 06227825 N 19810123 19810929 19890320 M171
4292069 06227825 N 19810123 19810929 19850314 M170
4292142 06224175 N 19810112 19810929 19930926 EXP.
4292142 06224175 N 19810112 19810929 19890316 M171
4292142 06224175 N 19810112 19810929 19861008 ASPN
4292142 06224175 N 19810112 19810929 19850925 M170
4292142 06224175 N 19810112 19810929 19850925 M176
4292142 06224175 N 19810112 19810929 19850812 REM.
...
RE45962 14454334 Y 20140807 20160405 20160323 ASPN
RE45972 14335639 N 20140718 20160412 20160512 M1551
RE45975 14464421 N 20140820 20160412 20160511 M1551
RE45975 14464421 N 20140820 20160412 20160510 ASPN
RE46021 13775962 N 20130225 20160531 20160621 M1551
RE46028 14491699 N 20140919 20160614 20160621 STOL
RE46046 13755710 N 20130131 20160628 20160624 ASPN
RE46051 10137107 N 20020502 20160705 20160624 ASPN
RE46074 14249009 N 20140617 20160719 20160614 ASPN
data.txt has ~ 200 lines, and looks like this:
6268343
6268343
6268343
6268343
6268343
6268343
7749955
8710181
6268343
6384016
6458924
...