AI Voice Notes

Revolutionizing meeting summaries with real-time speech recognition and AI-powered insights. Built from first principles, this application redefines the future of voice note taking.

Stars0

Forks0

Watchers0

View Source

The Problem of Inefficient Meeting Notes

Traditional meeting note taking methods are often tedious, inaccurate, and time-consuming. The STT-internship project addresses this issue by leveraging cutting-edge speech recognition technology and AI-driven summarization to generate precise, structured meeting notes.

By harnessing the power of Google's Gemini API, this application provides a seamless and intuitive experience, allowing users to focus on the conversation rather than note taking. The result is a more efficient, organized, and productive meeting experience.

STACK:Python 3.10+TkinterttkbootstrapsounddevicenumpySpeechRecognitionGoogle Gemini APIpython-dotenv

System Architecture

Real-time Speech Recognition Layer

Utilizes the Google Web Speech API to deliver high-accuracy, continuous speech-to-text transcription, ensuring accurate and up-to-date meeting notes.

AI-Powered Summarization Engine

Employs the Google Gemini API to instantly generate structured summaries, key discussion points, action items, and decisions, providing users with valuable insights and actionable information.

Thread-Safe Event Pipelines

Ensures a stable and responsive GUI by delegating networking and CPU-bound tasks to background daemon threads, preventing GUI freezes during audio stream capture and API requests.

Engineering Deep Dive

Challenge

Legacy PyAudio Compatibility Issues

Resolution

Replaced PyAudio with sounddevice and soundfile to enable native Windows support without requiring C++ compilation tools, ensuring a seamless audio capture experience.

Challenge

Maintaining GUI Responsiveness

Resolution

Implemented thread-safe event pipelines and message-passing mechanisms to prevent GUI freezes and ensure a smooth user experience.