Introduction
This project implements a chat interface with voice input and output capabilities, running on a microcontroller with a display. It utilizes various modules for Wi-Fi connectivity, audio processing, and UI rendering. The script interacts with an AI model (likely GPT) for generating responses and can execute various tools based on the AI's instructions.
Main Components
Imports and Initializations
- Imports necessary modules and initializes hardware and UI components.
- Sets up Wi-Fi connection using provided credentials.
User Interface
- Utilizes the LVGL library for creating a graphical user interface.
- Includes a chat container, status label, and a record button.
Audio Processing
- Implements recording and playback functionalities.
- Uses OpenAI's API for speech-to-text conversion.
AI Interaction
- Communicates with an AI model (likely GPT) using Anthropic's API.
- Supports tool execution based on AI responses.
Asynchronous Operations
- Uses
asyncio
for managing concurrent tasks.
- Uses
Key Functionalities
1. UI Initialization (init_ui()
)
- Creates the main screen with a chat container, status label, and record button.
- Sets up joystick controls for scrolling.
2. Message Handling
add_message()
: Adds messages to the chat container.remove_accents()
: Removes accents from text for display purposes.
3. Input Processing
- Monitors button presses for recording and UI interaction.
- Converts speech to text using OpenAI's API.
4. AI Interaction
- Sends user messages to the AI model.
- Processes AI responses, including tool use requests.
5. Tool Execution
- Supports file reading, writing, and execution tools.
- Handles tool results and feeds them back to the AI.
6. Audio Playback
- Converts AI responses to speech and plays them back.
7. Main Loop (main()
)
- Orchestrates the entire conversation flow:
- Records user input
- Converts speech to text
- Sends text to AI
- Processes AI response (including tool use)
- Converts AI response to speech
- Plays back the response
- Repeats the process
Usage
The script is designed to run on a microcontroller with the necessary hardware components (display, buttons, microphone, speaker). It creates an interactive chat interface where users can have voice conversations with an AI assistant.
Dependencies
wifi
,secrets
,asyncio
- Custom modules:
hal_utils
,api_utils
,i2s_utils
,prompt_utils
,lvgl_utils
- LVGL library for UI
- OpenAI and Anthropic APIs for AI and speech processing
LLM and audio APIs
Overview
The api_utils.py
file contains utility functions for interacting with various APIs, primarily focused on AI language models and speech processing. It provides asynchronous functions for tasks such as communicating with AI models, text-to-speech conversion, and speech-to-text transcription.
Functions
1. llm(api_key, messages, max_tokens=8192, temp=0, system_prompt=None, tools=None)
Interacts with an AI language model (likely Claude by Anthropic) to generate responses.
Parameters:
api_key
(str): The API key for authentication.messages
(list): The conversation history.max_tokens
(int, optional): Maximum number of tokens in the response. Default is 8192.temp
(float, optional): Temperature for response generation. Default is 0.system_prompt
(str, optional): A system prompt to guide the AI's behavior.tools
(list, optional): A list of tools the AI can use.
Returns: JSON response from the API or None if an error occurs.
2. text_to_speech(api_key, text, acallback=None)
Converts text to speech using OpenAI's API.
Parameters:
api_key
(str): The API key for authentication.text
(str): The text to convert to speech.acallback
(function, optional): An asynchronous callback function to handle the response.
Returns: Audio data as bytes or None if an error occurs. If a callback is provided, it returns None.
3. speech_to_text(api_key, bytes_io)
Transcribes speech to text using OpenAI's API.
Parameters:
api_key
(str): The...
Got to learn alot! Amazing work!