bayesianbot 10 hours ago

Seeing this title, I really hoped they'd have a parameter to force the language for voice recognition. Being Finnish with a heavy rally-Finnish accent, the real-time modes quite often end up transcribing my English speech as Finnish text, and strangely, I get the correct Finnish for the words for what I spoke in English. It might not happen during the first sentence, but after a few queries and replies it usually does.

According to the OpenAI forums this is a common problem. I see they've addressed this in the post by prompting the model to stick to one language, but previously this didn’t work consistently, and in their Playground the newest `User transcript model` is still the same as before (`gpt-4o-transcribe`), so I don’t have high hopes. Must be hard to implement.

edit: Tried it again (with a prompt requesting English like always). By my 6th message it suddenly started transcribing to Finnish, and after that it became more common. Better than it used to be, but in many ways still useless. Though I'm sure it works better for people with lighter accents.

Cu3PO42 10 hours ago

I think it would be neat to hook up a (realtime) speech-to-speech model to something like Home Assistant for smart home controls + generic questions. HA has this feature, but is currently using a STT + text model + TTS pipeline, which works fine, but has higher delays and feels less... natural.

zebomon 10 hours ago

I've been using the voice chat in ChatGPT more and more frequently. I'm curious now to see how the costs associated with this would work through the API on some user-facing features. It's a cool update at a glance.

Sean-Der 10 hours ago

I worked on the SIP stuff. If anyone has questions/problems reach out anytime and would love to help :)

daft_pink 10 hours ago

Uh oh, with sip support we’re going to start getting ai scammers all the time!

  • OutOfHere 10 hours ago

    Yes, and they will go straight to voicemail. I don't know of anyone picks up calls from random numbers anymore, at least in this country.

    • hectormalot 8 hours ago

      Just happen have some stats on that (non-US context): 60% picks up a local number, about 40% picks up a foreign number (specifically the stat I have is a US number calling someone in a non-US geography).

      More than I expected.