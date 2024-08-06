Published: Tue 6 Aug 2024, 2:24 PM

OpenAI has unveiled its latest version of the ChatGPT bot, marking a significant advancement in the field of conversational artificial intelligence.

On Tuesday, OpenAI rolled out an advanced voice mode for ChatGPT, offering users their first experience with GPT-4o’s hyperrealistic audio capabilities. Initially, the enhanced version will be accessible to a limited group of ChatGPT Plus users, with a subscription priced at $20 (Dh74 approx.) per month.

However, they plan to extend this feature to all premium users gradually from September to November.

The new release promises enhanced capabilities, increased accuracy, and a more human-like interaction experience, with the latest enhancement set to transform the way users interact with AI, through real-time, voice-driven conversations.

OpenAI's use of hyperrealistic voice synthesis means that ChatGPT can produce speech that closely mimics human intonation, rhythm, and emotion. Users will find the AI's voice interactions to be engaging and intuitive, with responses that sound remarkably human. This development marks a significant step forward in making AI more accessible and user-friendly.

Advanced Voice Mode

You might already be familiar with the Voice Mode currently available in ChatGPT, but OpenAI's new Advanced Voice Mode offers a notable upgrade.

A significant focus of this release is on making interactions with the ChatGPT bot feel more natural and human-like. OpenAI has worked on refining the conversational tone of the bot, making it capable of understanding and replicating various styles of communication. Whether the user prefers a formal tone for business interactions or a casual, friendly chat, the new voice mode will be able to adapt accordingly.

Previously, ChatGPT relied on three separate models for its voice feature: one to transcribe your voice to text, GPT-4 to process the input, and another to convert the text back into speech. In contrast, GPT-4o will be built on a multimodal system that handles all these tasks internally, resulting in significantly lower latency during conversations. This will lead to a much quicker response rate, bringing it closer to real-life human interaction.

Additionally, OpenAI asserts that GPT-4o can also detect emotional intonations in your voice, such as sadness, excitement, or even singing.

Security and ethical considerations

Initially announced in May, the new voice feature has launched a month later than planned. OpenAI delayed the release to enhance safety measures, ensuring the model can effectively detect and reject inappropriate content.

As with any AI advancement, the introduction of voice capabilities brings ethical considerations and security challenges. OpenAI says it has implemented safeguards to prevent misuse of the voice feature, which include measures to detect and mitigate inappropriate content, as well as systems to ensure that voice data is handled securely and privately.