[Performance Regression] funasr 1.3.1 offline(vad+asr) latency is ~10x slower than 1.3.0 on RTX 4080/4090

## [Possible Regression] `funasr==1.3.1` is ~10x slower than `1.3.0` in offline VAD+ASR (RTX 4080/4090)

### Summary
After upgrading from `funasr==1.3.0` to `funasr==1.3.1`, we see a large latency increase in **offline mode (VAD + ASR)**.

Under the same code, same audio, same models, and same hardware:
- `1.3.0`: ASR latency is typically around **70–80 ms** for clips under 10s
- `1.3.1`: ASR latency is typically around **~800 ms** for the same clips

This looks like a performance regression on our side.

### Environment
- GPU: RTX 4080 / RTX 4090
- OS: [please fill]
- Python: [please fill]
- PyTorch: [please fill]
- CUDA + Driver: [please fill]
- FunASR versions compared: `1.3.0` vs `1.3.1`

### Pipeline
- Mode: `offline`
- VAD + ASR (+ punctuation)
- Same model IDs and same model revisions across both versions
- Timestamp output disabled for speed (`output_timestamp=False`)

### Reproduction Steps
1. Keep the exact same server code and same short audio set (<10s).
2. Run with `funasr==1.3.0` and collect latency logs.
3. Upgrade only FunASR to `funasr==1.3.1`.
4. Re-run with identical parameters and compare `inference_ms` / `total_ms`.

### Observed
- `1.3.0`: low latency (about 70–80 ms for short clips)
- `1.3.1`: much higher latency (about 800 ms on the same clips)

### Expected
`1.3.1` should have similar latency to `1.3.0` in this offline VAD+ASR setup, or there should be a clear migration note / config change needed to avoid this slowdown.

### What we already checked
- Same hardware and runtime setup
- Same code path and same input audio
- Same model IDs/revisions
- Same inference options (including timestamp-disabled path)
- Downgrading to `1.3.0` restores expected latency

### Question
Is there any known performance change in `1.3.1` for offline VAD+ASR?  
If needed, I can provide:
- minimal reproducible script
- full timing logs
- exact dependency versions (`pip freeze`)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance Regression] funasr 1.3.1 offline(vad+asr) latency is ~10x slower than 1.3.0 on RTX 4080/4090 #2809

[Possible Regression] `funasr==1.3.1` is ~10x slower than `1.3.0` in offline VAD+ASR (RTX 4080/4090)

Summary

Environment

Pipeline

Reproduction Steps

Observed

Expected

What we already checked

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance Regression] funasr 1.3.1 offline(vad+asr) latency is ~10x slower than 1.3.0 on RTX 4080/4090 #2809

Description

[Possible Regression] funasr==1.3.1 is ~10x slower than 1.3.0 in offline VAD+ASR (RTX 4080/4090)

Summary

Environment

Pipeline

Reproduction Steps

Observed

Expected

What we already checked

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Possible Regression] `funasr==1.3.1` is ~10x slower than `1.3.0` in offline VAD+ASR (RTX 4080/4090)