<small>**_Whisper Fine-Tuning + PyTorch: Teaching AI to understand the world's most mispronounced condiment._**</small>
![[../assets/Attachments/Lora/Pasted image 20260218134656.png]]
If you’re not a native English speaker, "Worcestershire" is basically the final boss of pronunciation.
I wanted to see if I could fix [**Whisper**](https://github.com/openai/whisper) (OpenAI's open-source speech recognition model that transcribes audio to text) so it would stop panicking every time I attempted the sauce.
Here's what I built:
**1. A custom audio dataset** : I recorded my own pronunciation attempts and paired each audio file with its correct transcription.
**2. A fine-tuning pipeline** : I converted each recording to a log-mel spectrogram, tokenised the transcripts using Whisper's multilingual tokeniser, and retrained the model.
**3. A Gradio app**: a live demo where you record yourself saying "Worcestershire" and see if the fine-tuned model gets it right.
Now the model finally gets it right and Whisper understands me.
****
***Check my code*** [**Github**](https://github.com/cocoritzy/week5/blob/main/Whisper/fine_tune.py)