// ARCHITECTURE OVERVIEW: STT-internship
Revolutionizing meeting summaries with real-time speech recognition and AI-powered insights. Built from first principles, this application redefines the future of voice note taking.
Traditional meeting note taking methods are often tedious, inaccurate, and time-consuming. The STT-internship project addresses this issue by leveraging cutting-edge speech recognition technology and AI-driven summarization to generate precise, structured meeting notes.
By harnessing the power of Google's Gemini API, this application provides a seamless and intuitive experience, allowing users to focus on the conversation rather than note taking. The result is a more efficient, organized, and productive meeting experience.
Utilizes the Google Web Speech API to deliver high-accuracy, continuous speech-to-text transcription, ensuring accurate and up-to-date meeting notes.
Employs the Google Gemini API to instantly generate structured summaries, key discussion points, action items, and decisions, providing users with valuable insights and actionable information.
Ensures a stable and responsive GUI by delegating networking and CPU-bound tasks to background daemon threads, preventing GUI freezes during audio stream capture and API requests.
Legacy PyAudio Compatibility Issues
Replaced PyAudio with sounddevice and soundfile to enable native Windows support without requiring C++ compilation tools, ensuring a seamless audio capture experience.
Maintaining GUI Responsiveness
Implemented thread-safe event pipelines and message-passing mechanisms to prevent GUI freezes and ensure a smooth user experience.