Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add LibreOffice security advisories importer#2210

Open
NucleiAv wants to merge 5 commits into
aboutcode-org:main from
NucleiAv:feat/libreoffice-importer-1898
Open

Add LibreOffice security advisories importer #2210
NucleiAv wants to merge 5 commits into
aboutcode-org:main from
NucleiAv:feat/libreoffice-importer-1898

Conversation

@NucleiAv

@NucleiAv NucleiAv commented Mar 14, 2026
edited by ziadhany
Loading

Copy link
Copy Markdown

Adds a pipeline importer for LibreOffice security advisories.

Instead of scraping individual advisory pages with BeautifulSoup which is brittle and hardcoded, and breaks whenever the site layout or UI changes, I used a different and a better approach. The importer fetches the advisory listing page, extracts CVE IDs, then calls the CVE API at https://cveawg.mitre.org/api/cve/{cve_id} for each one. This works because every LibreOffice advisory page links to the CVE record on https://www.cve.org/CVERecord?id={cve_id} in its references section, and the cveawg API returns the full structured CVE 5.0 JSON with CVSS scores, CWE weaknesses, references, and publish dates.

NucleiAv added 3 commits March 14, 2026 00:06
Fetches CVE IDs from the LibreOffice advisory listing page and
retrieves structured data (CVSS, CWE, references, dates) from
the CVE 5.0 JSON API at cveawg.mitre.org.
Fixes: aboutcode-org#1898
Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
Remove advisories.html fixture in favour of inline ADVISORY_HTML
constant. Drop dead mock attributes and _make_resp helper.
Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
Replace local re.findall CVE regex with the shared find_all_cve
utility. Normalise to uppercase before dedup to handle IGNORECASE
matches from both href and link text.
Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
logger = logging.getLogger(__name__)

ADVISORIES_URL = "https://www.libreoffice.org/about-us/security/advisories/"
CVE_API_URL = "https://cveawg.mitre.org/api/cve/{cve_id}"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NucleiAv This is incorrect. You are using two data sources: https://cveawg.mitre.org and https://www.libreoffice.org/about-us/security/advisories/. We should only use https://www.libreoffice.org/about-us/security/advisories/.

If https://www.libreoffice.org/about-us/security/advisories/ does not provide an API (feel free to do a deep search to confirm this), you should parse the HTML instead. Please take a look at other importers, such as the nginx importer: nginx_importer_v2.NginxImporterPipeline.

@NucleiAv NucleiAv Mar 14, 2026
edited
Loading

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ziadhany sure! I will incorporate the changes, but I thought that HTML parsing wont work if the website layout changes. Moreover libreoffice website does not provide details like CVSS score, CVSS version(2.0, 3.x, 4.0), Severity or CWEs, etc. To populate those details I those to use the api approach. I will research again if libreoffice provides an api and if not, will modify the code to do html parsing.

(below, no details regarding CVSS, CWE, etc is mentioned in the website)
image

NucleiAv added 2 commits March 14, 2026 16:08
Parse advisory listing and individual advisory pages directly from
libreoffice.org instead of calling cveawg.mitre.org. Drop unused
JSON fixtures and update tests accordingly.
Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>
Signed-off-by: Anmol Vats <anmolvats2003@gmail.com>

NucleiAv commented Mar 14, 2026
edited
Loading

Copy link
Copy Markdown
Author

@ziadhany I researched but could not find any API. So, I have switched to HTML parsing using BeautifulSoup, following the same pattern as the nginx importer. The importer now fetches the listing page to extract advisory URLs, then fetches each individual advisory page and parses the available fields like CVE ID, title, announced date, description, and references. As informed above, a few fields are not available on LibreOffice's pages and will remain empty like CVSS versions, CVSS scores, severity ratings, CWE IDs, and affected version ranges. LibreOffice only lists the fixed version. If those are needed in future, NVD enrichment would be a separate step.

@NucleiAv NucleiAv deleted the feat/libreoffice-importer-1898 branch March 16, 2026 13:35
@NucleiAv NucleiAv restored the feat/libreoffice-importer-1898 branch March 16, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@ziadhany ziadhany Awaiting requested review from ziadhany

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Collect data from https://www.libreoffice.org/about-us/security/advisories/

AltStyle によって変換されたページ (->オリジナル) /