I want to show an interactive audio waveform like this.
I want to show an interactive audio waveform like this.
I've extracted the sample data using AVAssetReader. Using this data, I'm drawing a UIBezierPath in a Scrollview's contentView. Currently, when I pinch zoom-in or zoom-out the scrollView, I'm downsampling the sample data to determine how many samples are to be shown.
class WaveformView: UIView {
var amplitudes: [CGFloat] = [] {
didSet {
setNeedsDisplay()
}
}
override func draw(_ rect: CGRect) {
guard let context = UIGraphicsGetCurrentContext(), !amplitudes.isEmpty else { return }
// Set up drawing parameters
context.setStrokeColor(UIColor.black.cgColor)
context.setLineWidth(1.0)
context.setLineCap(.round)
let midY = rect.height / 2
let widthPerSample = rect.width / CGFloat(amplitudes.count)
// Draw waveform
let path = UIBezierPath()
for (index, amplitude) in amplitudes.enumerated() {
let x = CGFloat(index) * widthPerSample
let height = amplitude * rect.height * 0.8
// Draw vertical line for each sample
path.move(to: CGPoint(x: x, y: midY - height))
path.addLine(to: CGPoint(x: x, y: midY + height))
}
path.stroke()
}
}
Added gesture handle
@objc private func handlePinch(_ gesture: UIPinchGestureRecognizer) {
switch gesture.state {
case .began:
initialPinchDistance = gesture.scale
case .changed:
let scaleFactor = gesture.scale / initialPinchDistance
var newScale = currentScale * scaleFactor
newScale = min(max(newScale, minScale), maxScale)
// Update displayed samples with new scale
updateDisplayedSamples(scale: newScale)
print(newScale)
// Maintain zoom center point
let pinchCenter = gesture.location(in: scrollView)
let offsetX = (pinchCenter.x - scrollView.bounds.origin.x) / scrollView.bounds.width
let newOffsetX = (totalWidth * offsetX) - (pinchCenter.x - scrollView.bounds.origin.x)
scrollView.contentOffset.x = max(0, min(newOffsetX, totalWidth - scrollView.bounds.width))
view.layoutIfNeeded()
case .ended, .cancelled:
currentScale = scrollView.contentSize.width / (baseWidth * widthPerSample)
default:
break
}
}
private func updateDisplayedSamples(scale: CGFloat) {
let targetSampleCount = Int(baseWidth * scale)
displayedSamples = downsampleWaveform(samples: rawSamples, targetCount: targetSampleCount)
waveformView.amplitudes = displayedSamples
totalWidth = CGFloat(displayedSamples.count) * widthPerSample
contentWidthConstraint?.constant = totalWidth
scrollView.contentSize = CGSize(width: totalWidth, height: 300)
}
private func downsampleWaveform(samples: [CGFloat], targetCount: Int) -> [CGFloat] {
guard samples.count > 0, targetCount > 0 else { return [] }
if samples.count <= targetCount {
return samples
}
var downsampled: [CGFloat] = []
let sampleSize = samples.count / targetCount
for i in 0..<targetCount {
let startIndex = i * sampleSize
let endIndex = min(startIndex + sampleSize, samples.count)
let slice = samples[startIndex..<endIndex]
// For each window, take the maximum value to preserve peaks
if let maxValue = slice.max() {
downsampled.append(maxValue)
}
}
return downsampled
}
The following approach works very inefficiently as every time gesture.state
is changed, I'm calculating the downsampled data and perform UI operation based on that. How can I implement this functionality more efficiently for smooth interaction?
-
\$\begingroup\$ Use a sparse table \$\endgroup\$ielyamani– ielyamani2025年02月06日 02:23:02 +00:00Commented Feb 6 at 2:23
1 Answer 1
How can I implement this functionality more efficiently for smooth interaction?
Pre-compute at different resolutions.
maxValue = slice.max()
Side note: it's not clear that .max() is ideal for this. Maybe use median of window? Or the 80-th or 90-th percentile value of a window?
Upon initial loading of the waveform we're going to be displaying everything, so slice it into windows, compute each window value as max or median or whatever, hang onto those values, and display them.
Now pretend the user asked to see half of the timespan. There's been no gesture, no user interaction, so we do not yet know the starting point, but that's OK. We'll just compute window values for everything at that resolution, and hang onto the values.
Repeat for quarter, eighth, and so on. At some point we bottom out -- RAM to store the values becomes annoyingly large, and time to recompute exact values on the fly for a "small" timeslice is conveniently small.
Now we start accepting gestures. As the user pinches and pinches, we will dive down into using the "half timespan" or the "quarter timespan" data. Of course the user's requested {start, stop} timestamps won't match the precomputed data exactly. But we can go to the slightly higher resolution data, generate appropriate indexes, and display a subset of the stored data, skipping values occasionally.
Why is this effective? Because the number of pre-computed values approximately matches the display size, exceeding it at most by a factor of two.
If you're a stickler for accuracy, have a background thread do the unchanged OP calculation, and use double buffering to replace the "approximate view" with the "exact view" if it turns out the user went idle for a moment. OTOH if gesture events keep arriving, the background computational effort is wasted and is discarded, while the foreground thread keeps quickly displaying pre-computed values.
A background thread can also help with the "time to become interactive" startup latency upon loading a new waveform.
Explore related questions
See similar questions with these tags.