WHISPER

Whisper, by OpenAI, is a powerful model for automatic speech recognition (ASR) and speech translation. It was trained on an extensive dataset of over 5 million hours of audio, including 1 million hours with human-provided labels and 4 million hours with machine-generated labels. This large-scale, diverse training allows Whisper to accurately transcribe and translate speech in various languages and accents without additional adjustments, making it highly adaptable to new datasets and audio contexts. Its zero-shot capabilities enable it to perform well in previously unseen situations, making it ideal for transcription, live translation, and other real-world applications.

SPEECH

ACCEPT INPUT
AUDIO FILE 

TRANSCRIPTION
QUERY

GENERATE TRANSCRIBED TEXT 

PROCESSING 

LARGE
LANGUAGE MODEL

OUR IN-HOUSE SETUP

At Vistacan, we’ve integrated this powerful feature into our custom OpenEMR system. When a doctor dictates their notes, the audio is sent to an in-house server located at one of our clinics. There, the audio is processed using the Whisper model, which transcribes the dictation into a structured text format. This setup offers high accuracy compared to other tools, ensuring precise documentation of medical notes, as demonstrated in the resource comparisons provided below. This seamless workflow enhances efficiency for our healthcare professionals, enabling them to focus more on patient care and less on administrative tasks. Our setup leverages the robust, large-scale Whisper architecture also available on platforms in the AI research community – Hugging Face.

We also presented at the BC Rural Health Research Exchange organized by the Rural Coordination Center of British Columbia (Presentation attached).  The BC Rural Health Research Exchange (BCRHRx) is a virtual, half-day event featuring rapid-style presentations aimed at informing, engaging, and sharing the latest rural health research in British Columbia

RESOURCE LINKS