Search code, repositories, users, issues, pull requests...

@nabobalis nabobalis commented Apr 5, 2026 •

edited

Loading

Copy link

Copy Markdown

Member

PR Description

I was told that via the VSO they only have level 0 data access for SOT FG and SP data.
So I decided to add scraper clients for both data sources.

The unit tests are a lot, but they are mostly offline and do cover all the combos for these files.

The other choice for the client is to download daily genx files to search and construct the urls and I thought that won't be accepted as a client.

TODO:

Is this plugged into the error catching that Clients can now do?
Changelog
Whatsnew?

AI Assistance Disclosure

AI tools were used for:

Code generation (e.g., when writing an implementation or fixing a bug)
Test/benchmark generation
Documentation (including examples)
Research and understanding
No AI tools were used

Regardless of AI use, the human contributor remains fully responsible for correctness, design choices, licensing compatibility, and long-term maintainability.

nabobalis

nabobalis commented

sunpy/net/dataretriever/sources/hinode.py Outdated

@nabobalis nabobalis force-pushed the sot_scrape branch from cc6f11a to c7d4baf Compare

April 6, 2026 16:40

nabobalis

nabobalis commented

sunpy/net/dataretriever/attrs/hinode.py

return __all__

class SOTDetector(SimpleAttr):

@nabobalis nabobalis Apr 6, 2026

Copy link

Copy Markdown

Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is technically the two instruments on SOT but I think the VSO does SOT as an instrument (a.Instrument.sot), so I don't want to have to make someone do SOTInstrument as an attr.

@nabobalis nabobalis force-pushed the sot_scrape branch from c7d4baf to 3bc4288 Compare

April 6, 2026 16:44

nabobalis

nabobalis commented

sunpy/net/dataretriever/sources/hinode.py Outdated

@nabobalis nabobalis marked this pull request as ready for review

April 6, 2026 16:50

@ayshih ayshih changed the title ~~(削除) Scarper Fido Clients for Hindoe SOT FG and SP data (削除ここまで)~~ (追記) Scraper Fido Clients for Hinode SOT FG and SP data (追記ここまで)

nabobalis added 3 commits

April 7, 2026 18:28


 Clients for Hindoe SOT FG and SP

4a0e534


 changelog

7ae6d34


 Fix bad URLS and added provider and reduced pointless test coverage

9ea3c92

@nabobalis nabobalis force-pushed the sot_scrape branch from 9b74123 to 9ea3c92 Compare

April 8, 2026 01:29

nabobalis

nabobalis commented

Apr 8, 2026

sunpy/net/dataretriever/sources/hinode.py

.. note::

Level 1 observations are grouped by observation start time in directories .

@nabobalis nabobalis Apr 8, 2026 •

edited

Loading

Copy link

Copy Markdown

Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to try and fix this, maybe?

@nabobalis nabobalis Apr 8, 2026 •

edited

Loading

Copy link

Copy Markdown

Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 def _search_level1(self, archive_url, timerange):
 """
 Two-step HTTP search for Level 1 SP data.

 Level 1 scan directories are named after the scan *start* time, but
 each file within the directory carries its own per-exposure timestamp.
 The standard `Scraper` pattern matching requires identical timestamps
 in both the directory name and the filename, so only the very first
 exposure of each scan would be found. This method avoids that
 limitation by:

 1. Enumerating candidate scan directories via `Scraper.range()`.
 2. Fetching each directory listing and matching individual filenames
 against their own timestamps, then filtering by the requested time
 range.
 """
 dir_pattern = _L1_SP_DIR_PATTERN.replace('{archive}', archive_url)
 dir_scraper = Scraper(format=dir_pattern)
 directories = dir_scraper.range(timerange)
 filemeta = []
 for directory in directories:
 try:
 opn = urlopen(directory)
 try:
 soup = BeautifulSoup(opn, "html.parser")
 for link in soup.find_all("a"):
 href = link.get("href")
 if not href:
 continue
 filename = href.rstrip('/').split('/')[-1]
 meta = parse(_L1_SP_FILE_PARSE_PATTERN, filename)
 if meta is None:
 continue
 exdict = meta.named
 try:
 file_tr = get_timerange_from_exdict(exdict)
 except Exception:
 continue
 if file_tr.intersects(timerange):
 exdict['url'] = directory + filename
 filemeta.append(exdict)
 finally:
 opn.close()
 except HTTPError as http_err:
 if http_err.code == 404:
 continue
 raise
 except URLError:
 continue
 return filemeta

works but yikes. I think a new feature to the scraper is required to be honest.