-
-
Notifications
You must be signed in to change notification settings - Fork 666
Open
Conversation
nabobalis
nabobalis
commented
Apr 6, 2026
@nabobalis
nabobalis
force-pushed
the
sot_scrape
branch
from
April 6, 2026 16:40
cc6f11a to
c7d4baf
Compare
nabobalis
nabobalis
commented
Apr 6, 2026
return __all__
class SOTDetector(SimpleAttr):
Member
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is technically the two instruments on SOT but I think the VSO does SOT as an instrument (a.Instrument.sot), so I don't want to have to make someone do SOTInstrument as an attr.
@nabobalis
nabobalis
force-pushed
the
sot_scrape
branch
from
April 6, 2026 16:44
c7d4baf to
3bc4288
Compare
nabobalis
nabobalis
commented
Apr 6, 2026
@ayshih
ayshih
changed the title
(削除) Scarper Fido Clients for Hindoe SOT FG and SP data (削除ここまで)
(追記) Scraper Fido Clients for Hinode SOT FG and SP data (追記ここまで)
Apr 6, 2026
@nabobalis
nabobalis
force-pushed
the
sot_scrape
branch
from
April 8, 2026 01:29
9b74123 to
9ea3c92
Compare
nabobalis
nabobalis
commented
Apr 8, 2026
.. note::
Level 1 observations are grouped by observation start time in directories .
Member
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to try and fix this, maybe?
Member
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _search_level1(self, archive_url, timerange): """ Two-step HTTP search for Level 1 SP data. Level 1 scan directories are named after the scan *start* time, but each file within the directory carries its own per-exposure timestamp. The standard `Scraper` pattern matching requires identical timestamps in both the directory name and the filename, so only the very first exposure of each scan would be found. This method avoids that limitation by: 1. Enumerating candidate scan directories via `Scraper.range()`. 2. Fetching each directory listing and matching individual filenames against their own timestamps, then filtering by the requested time range. """ dir_pattern = _L1_SP_DIR_PATTERN.replace('{archive}', archive_url) dir_scraper = Scraper(format=dir_pattern) directories = dir_scraper.range(timerange) filemeta = [] for directory in directories: try: opn = urlopen(directory) try: soup = BeautifulSoup(opn, "html.parser") for link in soup.find_all("a"): href = link.get("href") if not href: continue filename = href.rstrip('/').split('/')[-1] meta = parse(_L1_SP_FILE_PARSE_PATTERN, filename) if meta is None: continue exdict = meta.named try: file_tr = get_timerange_from_exdict(exdict) except Exception: continue if file_tr.intersects(timerange): exdict['url'] = directory + filename filemeta.append(exdict) finally: opn.close() except HTTPError as http_err: if http_err.code == 404: continue raise except URLError: continue return filemeta
works but yikes. I think a new feature to the scraper is required to be honest.
nabobalis
commented
Jun 15, 2026
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
PR Description
I was told that via the VSO they only have level 0 data access for SOT FG and SP data.
So I decided to add scraper clients for both data sources.
The unit tests are a lot, but they are mostly offline and do cover all the combos for these files.
The other choice for the client is to download daily genx files to search and construct the urls and I thought that won't be accepted as a client.
TODO:
AI Assistance Disclosure
AI tools were used for: