Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[BREAKING] FEAT add TAP to content harms scenario#1378

Open
hannahwestra25 wants to merge 13 commits intoAzure:main from
hannahwestra25:hawestra/add_tap_to_content_harms
Open

[BREAKING] FEAT add TAP to content harms scenario #1378
hannahwestra25 wants to merge 13 commits intoAzure:main from
hannahwestra25:hawestra/add_tap_to_content_harms

Conversation

@hannahwestra25
Copy link
Contributor

@hannahwestra25 hannahwestra25 commented Feb 19, 2026
edited
Loading

Description

  • Add TAP to the content harms scenario
  • align scenario naming
    • rename the scenario strategies to match casing (this is breaking)
  • remove multiturn / singleturn as tags from psychosocial strategy
    • this is breaking
  • fix broken images on website
  • update the uv.lock after recent updates to pyproject.toml

Tests and Documentation

added / updated tests

__all__ = [
"ContentHarms",
"ContentHarmsStrategy",
"PsychosocialScenario",
Copy link
Contributor

@romanlutz romanlutz Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these were released that way it could be breaking.

hannahwestra25 reacted with thumbs up emoji
Comment on lines -181 to +179
name="Content Harms",
name="ContentHarms",
Copy link
Contributor

@romanlutz romanlutz Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a display name or something else? Do we have an established casing rule? Asking because I want to know, not to criticize 🙂

Copy link
Contributor Author

@hannahwestra25 hannahwestra25 Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep for display. I changed it to one word because that's what the red team agent scenario does (which is the only other 2+ word scenario) but it also matches the scenario class name. I'm indifferent about whether its camelcase or spaces between words but want it to be consistent

Copy link
Contributor

@romanlutz romanlutz Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want it to match class name why don't we use that rather than setting custom strings that could diverge?

Args:
strategy (ScenarioCompositeStrategy): The strategy to create the attack from.
seed_groups (List[SeedAttackGroup]): The seed attack groups associated with the harm dataset.
strategy (str): The harm strategy name to create attacks for.
Copy link
Contributor

@romanlutz romanlutz Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this as list rather than types to the strategy seems worse? "harm strategy" sounds dangerous 😆

Copy link
Contributor Author

@hannahwestra25 hannahwestra25 Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean "this as a list rather than types to the strategy seems worse"? are you talking about the strategy type or the seed_groups type ? this is just a comment change but strategy is just the name of the harm which then is used for the atomic attack name so maybe I rename the variable to strategy_name at least if that's what you're referring to 😅

Copy link
Contributor

@romanlutz romanlutz Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, having it typed is better than not (or just "string" when you actually just want one from a specific set of literals). That's essentially the point.

FIRST_LETTER = ("first_letter", {"single_turn", "ip"}) # Good for copyright extraction
IMAGE = ("image", {"single_turn", "ip", "sensitive_data"})
ROLE_PLAY = ("role_play", {"single_turn", "sensitive_data"}) # Good for system prompt extraction
FirstLetter = ("first_letter", {"single_turn", "ip"}) # Good for copyright extraction
Copy link
Contributor

@romanlutz romanlutz Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this break people who are using it in all caps from the last release?

Copy link
Contributor Author

@hannahwestra25 hannahwestra25 Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good call!

@hannahwestra25 hannahwestra25 changed the title (削除) FEAT add TAP to content harms scenario (削除ここまで) (追記) [BREAKING] FEAT add TAP to content harms scenario (追記ここまで) Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@romanlutz romanlutz romanlutz left review comments

@jbolor21 jbolor21 jbolor21 left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Comments

AltStyle によって変換されたページ (->オリジナル) /