-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add bot traffic filtering and engaged sessions metric to static analytics #4837
Description
Summary
Add two improvements to the static analytics site generator to reduce bot noise in analytics reports:
-
Suspicious page path filter — regex-based filter in
fetch.pythat removes malformed/bot page paths from the pageviews detail table (broken markdown links, CMS probes, asset requests, etc.) -
Engaged sessions metric — queries GA4's
engagedSessionsmetric alongsidesessionsand displays it in the stats card as "Engaged Sessions" instead of "User Sessions"
Details
Suspicious page path filter
Removes paths like:
/](https://...)— broken markdown links//checkout/— e-commerce probes/help@lists...— email-as-path/robots.txt,/favicon-32x32.png— asset requests/docs/,/docs-EN/— CMS probes
Engaged sessions
GA4's engagedSessions counts only sessions where the user stayed 10+ seconds, viewed 2+ pages, or triggered a conversion. This gives a more honest session count by excluding bot drive-bys.
Files changed
analytics/static_site/fetch.py— addSUSPICIOUS_PAGE_PATH_RE,METRIC_ENGAGED_SESSIONS, filter logicanalytics/static_site/export.py— exportengaged_sessionsin meta.jsonanalytics/static_site/template/index.html— display engaged sessions in stats card
Note
These changes affect all sites using the shared analytics package (AnVIL Portal, LungMAP, HCA Explorer, etc.)