Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Comments

FEAT: New Audio Converters#1375

Open
petebryan wants to merge 14 commits intoAzure:main from
petebryan:pebryan_audio
Open

FEAT: New Audio Converters #1375
petebryan wants to merge 14 commits intoAzure:main from
petebryan:pebryan_audio

Conversation

@petebryan
Copy link
Contributor

@petebryan petebryan commented Feb 18, 2026

Description

Added new audio convertors to add the following:

  • Change the speed of an audio file without altering pitch AudioSpeedConverter
  • Add whitenoise over an existing audio file AudioWhiteNoiseConverter
  • Add an echo to an existing audio file AudioEchoConverter
  • Adjust volume of an audio file by scaling the amplitude. AudioVolumeConverter

Added new translation convertor to allow for mid sentence language switching in a prompt MultiLanguageTranslationConverter

  • Distinct from RandomTranslationConverter as focused on segment level granularity and deterministic translation.

Updated AzureSpeechTextToAudioConverter to handle a situation where an audio file input is handled and just passed back out. This handles situations when using the convertors with conversation history that may include mixed audio and text Messages that would otherwise throw exceptions.

Sorry I did not raise an issue for this ahead of time, experimentation of ideas turned into code and wanted to contribute. Happy to refactor whoever is deemed best.

Tests and Documentation

  • Added unit tests for all convertors to test functionality and ensure audio transformations do not adversely affect audio files. (58 unit tests)
  • Updated AzureSpeechTextToAudioConverter tests to test for case when audio_file is provided as input after update. (1 unit test)
  • Updated convertor documentation .py files to reflect these updates then ran jupytext --execute --to notebook to generate notebooks.

@petebryan petebryan marked this pull request as draft February 18, 2026 03:53
@petebryan petebryan marked this pull request as ready for review February 18, 2026 17:40
@petebryan petebryan changed the title (削除) [DRAFT] FEAT - New Audio Convertors (削除ここまで) (追記) FEAT: New Audio Convertors (追記ここまで) Feb 18, 2026
@romanlutz romanlutz changed the title (削除) FEAT: New Audio Convertors (削除ここまで) (追記) FEAT: New Audio Converters (追記ここまで) Feb 20, 2026
Comment on lines +128 to +133
if not self.input_supported(input_type):
raise ValueError("Input type not supported")

# If the input is already an audio path, pass it through unchanged.
if input_type == "audio_path":
return ConverterResult(output_text=prompt, output_type="audio_path")
Copy link
Contributor

@romanlutz romanlutz Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must be thinking of a use case I am unable to anticipate 🙂 Can you elaborate?

Copy link
Contributor Author

@petebryan petebryan Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes you want to generate attacks that include previous turns and then add new turns on top. The problem is if those previous turns were audio, and the new turn you want to add on top is based on a text prompt. Then you have a mix of audio & text together and when you have the convertor attached to the target all prompts go through the convertor leading to it throwing an error when it tries to convert the audio_file of the previous turns. You could account for this in the notebook at run time but its easier and cleaner to have the convertor handle this and just pass through things are already audio. I couldn't really see a downside to having this in the convertor but keen to know if you can think of a problem this may cause.

logger = logging.getLogger(__name__)


class MultiLanguageTranslationConverter(PromptConverter):
Copy link
Contributor

@romanlutz romanlutz Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually doable with the selective text converter + translation converter, see https://azure.github.io/PyRIT/code/converters/6_selectively_converting.html#example-7-applying-converters-to-different-parts

I do see the appeal of having a shortcut, though. Wdyt?

Copy link
Contributor Author

@petebryan petebryan Feb 20, 2026
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so I spent some time on this. My reasoning for having a separate convertor for this is:

  1. Without digging into the docs a bit its not easy to see how to do this, type of splitting so its not easily discoverable
  2. Implementing it in a notebook flow is a bit cumbersome, especially when you want to try a lot of different approaches. Having a convertor makes it cleaner and easier to implement but happy to change course if other disagree.
  3. Baking the capability into the RandomTranslationConverter could be doable be would add a level of complexity to the convertor that I felt having a separate one made sense from a maintainability point of view but very happy to take guidance on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we could do something like:

  1. Wrap the splitting and chaining logic in something like SequenceLevelConverter (effectively a generalized version of WordLevelConverter)
  2. Merge MultiLanguageTranslationConverter and RandomTranslationConverter, maybe inheriting this new SequenceLevelConverter, supporting both fixed/random language selection, and sequence/word splitting.

logger.info(
"Multi-language translation complete: %d segments across languages %s",
len(translated_segments),
self.languages[: len(segments)],
Copy link
Contributor

@hannahwestra25 hannahwestra25 Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
self.languages[: len(segments)],
self.languages[: len(self.languages)],

language = self.languages[i]

system_prompt = self._prompt_template.render_template_value(languages=language)
conversation_id = str(uuid.uuid4())
Copy link
Contributor

@hannahwestra25 hannahwestra25 Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it matter at all that all of these are going to be part of a different conversation?

Raises:
ValueError: If speed_factor is not positive.
"""
if speed_factor <= 0:
Copy link
Contributor

@hannahwestra25 hannahwestra25 Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an upper bound ?

info = np.iinfo(data.dtype)
max_val = float(info.max)
else:
max_val = 1.0
Copy link
Contributor

@hannahwestra25 hannahwestra25 Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should info be assigned here ? what happens on line 80 if it's not ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@romanlutz romanlutz romanlutz left review comments

@hannahwestra25 hannahwestra25 hannahwestra25 left review comments

+1 more reviewer

@fdubut fdubut fdubut left review comments

Reviewers whose approvals may not affect merge requirements

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /