Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Scraper Fido Clients for Hinode SOT FG and SP data#8566

Open
nabobalis wants to merge 3 commits into
sunpy:main from
nabobalis:sot_scrape
Open

Scraper Fido Clients for Hinode SOT FG and SP data #8566
nabobalis wants to merge 3 commits into
sunpy:main from
nabobalis:sot_scrape

Conversation

@nabobalis

@nabobalis nabobalis commented Apr 5, 2026
edited
Loading

Copy link
Copy Markdown
Member

PR Description

I was told that via the VSO they only have level 0 data access for SOT FG and SP data.
So I decided to add scraper clients for both data sources.

The unit tests are a lot, but they are mostly offline and do cover all the combos for these files.

The other choice for the client is to download daily genx files to search and construct the urls and I thought that won't be accepted as a client.

TODO:

  • Is this plugged into the error catching that Clients can now do?
  • Changelog
  • Whatsnew?

AI Assistance Disclosure

AI tools were used for:

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding
  • No AI tools were used

Regardless of AI use, the human contributor remains fully responsible for correctness, design choices, licensing compatibility, and long-term maintainability.

Comment thread sunpy/net/dataretriever/sources/hinode.py Outdated
return __all__


class SOTDetector(SimpleAttr):

@nabobalis nabobalis Apr 6, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is technically the two instruments on SOT but I think the VSO does SOT as an instrument (a.Instrument.sot), so I don't want to have to make someone do SOTInstrument as an attr.

Comment thread sunpy/net/dataretriever/sources/hinode.py Outdated
@nabobalis nabobalis marked this pull request as ready for review April 6, 2026 16:50
@ayshih ayshih changed the title (削除) Scarper Fido Clients for Hindoe SOT FG and SP data (削除ここまで) (追記) Scraper Fido Clients for Hinode SOT FG and SP data (追記ここまで) Apr 6, 2026

.. note::

Level 1 observations are grouped by observation start time in directories .

@nabobalis nabobalis Apr 8, 2026
edited
Loading

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to try and fix this, maybe?

@nabobalis nabobalis Apr 8, 2026
edited
Loading

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 def _search_level1(self, archive_url, timerange):
 """
 Two-step HTTP search for Level 1 SP data.

 Level 1 scan directories are named after the scan *start* time, but
 each file within the directory carries its own per-exposure timestamp.
 The standard `Scraper` pattern matching requires identical timestamps
 in both the directory name and the filename, so only the very first
 exposure of each scan would be found. This method avoids that
 limitation by:

 1. Enumerating candidate scan directories via `Scraper.range()`.
 2. Fetching each directory listing and matching individual filenames
 against their own timestamps, then filtering by the requested time
 range.
 """
 dir_pattern = _L1_SP_DIR_PATTERN.replace('{archive}', archive_url)
 dir_scraper = Scraper(format=dir_pattern)
 directories = dir_scraper.range(timerange)
 filemeta = []
 for directory in directories:
 try:
 opn = urlopen(directory)
 try:
 soup = BeautifulSoup(opn, "html.parser")
 for link in soup.find_all("a"):
 href = link.get("href")
 if not href:
 continue
 filename = href.rstrip('/').split('/')[-1]
 meta = parse(_L1_SP_FILE_PARSE_PATTERN, filename)
 if meta is None:
 continue
 exdict = meta.named
 try:
 file_tr = get_timerange_from_exdict(exdict)
 except Exception:
 continue
 if file_tr.intersects(timerange):
 exdict['url'] = directory + filename
 filemeta.append(exdict)
 finally:
 opn.close()
 except HTTPError as http_err:
 if http_err.code == 404:
 continue
 raise
 except URLError:
 continue
 return filemeta

works but yikes. I think a new feature to the scraper is required to be honest.

Copy link
Copy Markdown
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /