-
Notifications
You must be signed in to change notification settings - Fork 489
WIP: Reimplementing search_dates#945
WIP: Reimplementing search_dates #945gavishpoddar wants to merge 44 commits intoscrapinghub:master from
search_dates #945Conversation
This PR will also solve many issues with the search_dates.
- date has year inside brackets #924
- search_dates() wrong result for 12:xx am #894
- Errors in processing dates in Russian text #918
- search_dates cannot find complete date #521
- False positives when searching dates #582
- search_dates is... #681
- J, K, W, X, Y letters confuse the parser #774
- Mistake search_dates for ru language #706 (Correct
datetimeobject but wrong string) (related issue 2) - Parse whole words only #856 (Correct
datetimeobject but wrong string) (related issue 2)
Somewhat address.
- Month and Month Year (e.g. April and May 2019) will return None #507 (Now returns date object but not range)
- Date ambiguities during search_dates #689 (Now returnes datetime.datetime(2019, 2, 10, 0, 0) but the issue is not clear)
- Search dates failing to work due to lack of commas? #695
- Dates not getting recognised or wrongly recognised if there are some numbers in the text #787 (Now returnes
datetimeobject but the issue is not clear)
Some other issue fixed:
2021年08月04日T14:21:37+05:30
would parse 05:30 only after this PR
2021年08月04日T14:21:37 and 05:30
NOTE: THIS LIST IS NO LONGER ACCURATE ANYMORE
gavishpoddar
commented
Jul 17, 2021
Hi, I need a suggestion should I use translated chunks or the original chunks to further parse the data objects. I have currently used translated chunks instead of original chunks as this increased accuracy in some basic tests.
Thanks, and please suggest.
search_dates & fixing search translation (追記ここまで)
Hi, I need a suggestion should I use translated chunks or the original chunks to further parse the data objects. I have currently used translated chunks instead of original chunks as this increased accuracy in some basic tests.
Thanks, and please suggest.
Hi @gavishpoddar, using translated chunks makes sense. 👍
Codecov ReportBase: 98.23% // Head: 98.10% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@ ## master #945 +/- ## ========================================== - Coverage 98.23% 98.10% -0.13% ========================================== Files 232 235 +3 Lines 2604 2692 +88 ========================================== + Hits 2558 2641 +83 - Misses 46 51 +5
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
gavishpoddar
commented
Jul 21, 2021
Hi @noviluni, I have made the changes for tests to work. Can you please approve the workflow approval?
lopuhin
commented
Jul 22, 2021
@gavishpoddar workflows approved 👍
gavishpoddar
commented
Aug 3, 2021
gavishpoddar
commented
Aug 3, 2021
I need help with one test
StaticTzInfo 'UTC' is expected.
UTC is received from new search_dates
Hi after some changes the codes are very much compatible with the old search_dates. Some tests are modified as the working of the search_dates has been changed.
Currently, this PR is left with some docs changes (just replacing the current with new docs and search_first_date).
search_dates & fixing search translation (削除ここまで)search_dates & fixing search translation (追記ここまで)
gavishpoddar
commented
Aug 24, 2021
Replaced previous search_dates with the new implimentaion
@lopuhin
lopuhin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gavishpoddar , I left some comments regarding tests. Please tell if you need advice with how to implement xfail.
@lopuhin
lopuhin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gavishpoddar a few docs suggestions
Co-authored-by: Konstantin Lopuhin <kostia.lopuhin@gmail.com>
gavishpoddar
commented
Sep 7, 2021
Hey, this PR is updated with the #932
Eckii24
commented
Aug 6, 2022
Hi @gavishpoddar,
I found your PR to improve the behavior of search_dates.
There has been no further progress for almost a year. What is the current status of this PR?
Is there anything I can do to help get this PR merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of DateSearchWithDetection.search is backward-incompatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can create a shortcut the make DateSearchWithDetection.search and add a deprecation warning or simply rename.
Please suggest a preferred action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping the old objects around with their old names, logging a warning when used (i.e. by exposing them through a property whose getter logs a warning), would be ideal.
search_dates (削除ここまで)search_dates (追記ここまで)
Uh oh!
There was an error while loading. Please reload this page.
Reimplementing and simplifying
search_datesA reimplemented and simplified
search_dateswhich more directly usesdateparser.parse, improves accuracy and fixes many bugsNew Feature:
search_first_date- searches and returns the first date from the given text.NOTE: This PR is inspired by the previous implementation of search_dates and #931.
TODO
DATE_ORDER