0

Introduction

I'm building a real-time pitch detection app in Kotlin/Android using TarsosDSP. The app captures audio input, detects the fundamental frequency using the FFT_YIN algorithm, and displays the result on a vertical piano roll view that allows you to visually track pitch over time.

What works:

Pitch detection works correctly while holding or playing steady notes (from voice or keyboard). The graphical interface correctly matches pitch values to their vertical position (Y-axis) on the piano roll.

The issue:

When releasing a note abruptly (especially with voice or acoustic instruments), there's a sudden, violent drop in pitch to a very low frequency (e.g., 50–60 Hz), followed by an equally fast return to silence or the previous level.

This manifests visually as a steep cliff in the trace drawn on the piano roll. These drops happen only after release, not during sustained notes, and seem to originate within TarsosDSP’s pitch detection, not my own graphing or math.

What I've confirmed:

  • I validated my pitch-to-Y conversion: for every frame with a "violent" drop, the computed Y value matches the pitch returned by TarsosDSP. So there's no bug in my drawPitchGraph() logic.

  • I've tried multiple algorithms: YIN, FFT_YIN, FFT_PITCH, DYNAMIC_WAVELET, and all show similar behavior. The most stable so far is FFT_YIN.

  • Changing sample rate and buffer size didn’t help:

sampleRate = 22050, bufferSize = 1024, overlap = 0 gave better results. sampleRate = 44100 caused more frequent pitch spikes on release.

What it looks like:

[screen print for 20:63] [1] https://i.sstatic.net/AS3jMQ8J.png

[screen print for 20:70] [2] https://i.sstatic.net/oT5YK1oA.png - Erratic fall happens here

[screen print for 21:55 ] [3] https://i.sstatic.net/2l5BFCM6.png - Goes to next played note after a quick return to 1st position

Logs for each picture:

[1].

Time: 0:20:63

pithInHz: 219.63284 (Hz)

y = 985.6981

[2].

Timer updated: 20:70 s

pitchInHz: 54.94872 (Hz)

y = 1599.3716

[3].

Timer updated: 0:21:55

pitchInHz: 196.43188 Hz)

y = 1035.145

I need to mention that what I played with the piano from the first to the third image is just two notes. First, A3, that correctly shows in the first image. I literally released the note a milisecond before the second image at 20:70. The next note I played almost instantly was a G3 (at 21:30 aprox), that correctly shows too in the 3rd image. The fall and comeback to first y position happens in between both notes, where basically there was no sound because I am in a very silent environment.

I have tested silent background noises alone with my app and tarsos doesn ́t even detect them, so that removes background noise possibility for the most part.

Snippets of code involved

  1. Pitch detection setup. This setup is nested inside the listener of my playPauseButton, and is executed when clicking on "Play" and stopped when "Paused":
val sampleRate = 22050
val bufferSize = 1024
val overlap = 0
val audioRecord = AudioRecord(
 MediaRecorder.AudioSource.MIC,
 sampleRate,
 AudioFormat.CHANNEL_IN_MONO,
 AudioFormat.ENCODING_PCM_16BIT,
 bufferSize
)
val tarsosFormat = TarsosDSPAudioFormat(
 sampleRate.toFloat(),
 16, // bitsPerSample
 1, // channels
 true, // signed
 false // bigEndian
)
val inputStream = AndroidAudioInputStream(audioRecord, tarsosFormat)
dispatcher = AudioDispatcher(inputStream, bufferSize, overlap)
val pdh = PitchDetectionHandler { result, _ ->
 val pitchInHz = result.pitch
 if (pitchInHz > 0) {
 val note = NoteConverter.hzToNote(pitchInHz)
 runOnUiThread {
 findViewById<TextView>(R.id.textView1).text = note
 }
 } else {
 runOnUiThread {
 findViewById<TextView>(R.id.textView1).text = "No note detected"
 }
 }
 pianoRollView.pitchInHz2 = pitchInHz
 pianoRollView.invalidate()
}
val p: AudioProcessor = PitchProcessor(
 PitchProcessor.PitchEstimationAlgorithm.FFT_YIN,
 22050f,
 1024,
 pdh
)
dispatcher?.addAudioProcessor(p)
Thread(dispatcher, "Audio Dispatcher").start()
  1. My drawPitchGraph() and drawTrace() methods draw the current and previous pitch detected ( called in my main onDraw() ) inside the PianoRollView()class:
private fun drawPitchGraph(canvas: Canvas) {
 if (baseKeyHeight == 0f || listOfNotes.isEmpty()) return
 val a4Index = listOfNotes.indexOf("A4")
 val yRef = baseKeyHeight * (totalKeys - 1 - a4Index) + baseKeyHeight / 2f
 val y: Float? = if (pitchInHz2 > 0.0) {
 val semitoneOffset = log2(pitchInHz2 / 440.0) * 12
 yRef - baseKeyHeight * semitoneOffset.toFloat()
 } else {
 null
 }
 currentYValue = y
 if (y != null) {
 canvas.drawCircle(width * 0.7f, y, 8f, pitchPaint)
 }
}
fun drawTrace(canvas: Canvas) {
 currentYValue?.let { y ->
 tracePoints.add(TracePoint(width.toFloat() * 0.7f, y))
 } ?: tracePoints.add(null)
 for (i in tracePoints.indices) {
 tracePoints[i]?.let {
 tracePoints[i] = it.copy(x = it.x - 2f)
 }
 }
 for (i in 0 until tracePoints.size - 1) {
 val p1 = tracePoints[i]
 val p2 = tracePoints[i + 1]
 if (p1 != null && p2 != null) {
 canvas.drawLine(p1.x, p1.y, p2.x, p2.y, pastPitchPaint)
 }
 }
 tracePoints.lastOrNull()?.let {
 canvas.drawCircle(it.x, it.y, 8f, pastPitchPaint)
 }
 invalidate()
}

My question:

Is this an expected behavior from FFT_YIN or similar algorithms? Are there more appropiate algorithms for pitch-tracking or can I re-configure FFT_YIN for a more reliable performance?

Could this be caused by TarsosDSP's internal handling of noisy transitions or silence? Is there a way to filter out false pitch spikes after release, without harming real low notes?

Would it help to analyze probability or RMS values, or is there a better approach?

Bonus:

I’m happy to provide more of the graphing logic, or exact output logs if it helps. i could even upload a video if needed. I'm also open to workarounds — even if it means preprocessing or stabilizing the pitch values before drawing.

Thanks in advance for your help!

2
  • This feels like a conceptual question. Possibly related search on 'fft pitch detection' in signal stackexchange for questions of similar topic. Do read their FAQ regarding on-topic questions before posting there. Commented Jul 26, 2025 at 2:23
  • Hello. Thank you for your advice. The signal processing community in Stack Exchange indeed seems to be the most appropiate for this issue. Unfortunately, I uploaded my question yesterday and haven´t got any answers so far. Could be that is just not a very active community? Or is my question too long? I just hope I can get some help with this Commented Jul 27, 2025 at 9:58

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.