voice-user interface – Alan AI Blog https://alan.app/blog/ Follow the most recent Generative AI articles Tue, 23 Jan 2024 07:28:30 +0000 en-US hourly 1 https://i0.wp.com/synqqblog.wpcomstaging.com/wp-content/uploads/2019/10/favicon-32x32.png?fit=32%2C32&ssl=1 voice-user interface – Alan AI Blog https://alan.app/blog/ 32 32 111528672 Voice AI Hackathon: World’s first Hackathon for Voice-enabled Applications https://alan.app/blog/voice-ai-hackathon/ https://alan.app/blog/voice-ai-hackathon/#respond Thu, 02 Jul 2020 15:36:55 +0000 https://alan.app/blog/?p=3866 Alan AI is hosting its first virtual hackathon about Voice AI and inviting developers worldwide to take part! With the Voice AI Hackathon, we are challenging developers to integrate a voice assistant to their new or existing applications or other open-source apps through our conversational voice platform.]]>

Have you ever wanted to develop voice-enabled mobile or web applications?
Well, You’re in the right place!

Alan AI is hosting its first virtual hackathon about Voice AI and inviting developers worldwide to take part! With the Voice AI Hackathon, we are challenging developers (as individuals or teams of 3) to integrate a voice assistant to their new or existing applications or other open-source apps through our conversational voice platform. All you need is basic JavaScript knowledge and the determination to win the TOP PRIZE of $500!

Participants can choose to voice-embed any application — from gaming, social networking, food delivery, or any other industry — the opportunities are endless.

Project submissions are due on July 15, 2020 at 11:59 PM (PST) which can be submitted with a website link, App Store/Play Store link or photo proof of app in the process of being published to App Store/Play Store. Participants will have Alan developers and mentors ready to assist with any questions or concerns throughout your progress. The top three submissions will win a cash prizes and be featured on our Alan platforms.

Ready to sign up? Fill out our sign up form ✨

Learn more about the hackathon on our official Hackathon Website 🚀

We are excited to see what you build with the Alan Platform!

]]>
https://alan.app/blog/voice-ai-hackathon/feed/ 0 3866
Top 10 Hands-Free Apps for Android 2020 https://alan.app/blog/top-10-hands-free-apps-for-android-2020/ https://alan.app/blog/top-10-hands-free-apps-for-android-2020/#respond Mon, 27 Apr 2020 13:59:57 +0000 https://alan.app/blog/?p=3360 We’ve gathered some of the most popular and useful hands-free apps for Android to see what they can offer and why other businesses should be heading in that direction as well.]]>

Forward-looking businesses are starting to explore the possibilities of introducing voice control into their applications. Therefore, we are seeing a noticeable increase in Android apps with voice-operated software that provide a hands-free experience.

We’ve gathered some of the most popular and useful hands-free apps for Android to see what they can offer and why other businesses should be heading in that direction as well.

Top 10 Hands Free Apps 2019-2020 - 1

The term “hands-free” refers to equipment or software that requires limited or no use of hands. One of the most popular ways to access controls for hands-free apps is through voice. The main goal is to make sure all users can use features within the app – regardless of their ability to physically operate the device.

Voice is being integrated into all kinds of devices, and it’s reshaping the usual state of things. Here are a few reasons why making your application hands-free is a good idea, business-wise and in general:

  • Convenience – Hands-free apps can be used anywhere: while driving, doing chores around the house, carrying things, or when you’re simply far away from the device.
  • Accessibility – These apps can be operated by people with limited hand mobility, those who are visually impaired, and other groups in need of assistive technology.
  • Time efficiency – In many situations, making a quick call takes less time than typing a lengthy message and waiting for a response. The same principle applies to voice control; it requires no clicks, no typing, or any other time-consuming actions. 
  • Simplicity – Users don’t have to be familiar with the interface to handle it. Unlike traditional apps, you hardly need any computer literacy or technical skills.
  • Multi-use – Voice control isn’t strictly tied to one function. This kind of software is incredibly versatile in terms of potential applications.

Hands-free technology is particularly useful in countries where it’s illegal to use a handheld mobile phone when you drive. These laws have been adopted in many jurisdictions around the world, which gave developers another incentive to develop the technology.

Top 10 Hands Free Apps 2019-2020 - 2.jpg

The market of hands-free applications is an interesting space right now. Let’s look at the best offerings available in the Play Store for Android users.

1. Google Assistant

Google Assistant is considered an undisputed champion of personal assistant apps developed for Android. Although it may not work on every device, the coverage is extensive. In addition to running the app on your phone, you can also integrate with smart devices such as Philips Hue lights.

The assistant can run basic functions like making calls, sending texts, emails, setting alarms and reminders, etc. On top of that, you can look up weather reports and news updates, send web searches, and play music. The range of features is constantly getting updated and expanded.

The company states the app was originally designed for people with disabilities and conditions like Parkinson’s and multiple sclerosis. However, it should come in useful for anyone who’s multitasking or has their hands full. To activate Google Assistant, users need to say “OK Google,” and it will be all ears.

2. Amazon Alexa

Amazon Alexa has pushed the trend of endless integration with many emerging smart home devices to the forefront. Contrary to popular belief, this service runs not only on Amazon Echo but also on mobile devices. 

Alexa for Android is mostly used to control integrated devices. But the functionality also supports web searches, playing music, and even ordering deliveries. If you want to launch the hands-free app, say “Alexa” and it will be ready to hear commands whether the screen is on or off. 

The device restrictions are by far the biggest downside of Amazon Alexa. So far, there is a limited number of mobile phones supporting this system. However, in terms of its abilities and intelligence, it rightly occupies the top of the list.

3. Bixby

Bixby is a relatively new addition, but it is already among the best. It’s important to mention that it’s only compatible with Samsung devices. The company may be looking into other platforms, but at this point, it only runs on devices and appliances connected to Samsung’s proprietary hub.

The app can accomplish a variety of tasks – from sending text messages and responding to basic questions to activating other applications in the device (dialer, settings menus, camera app, contacts list, and gallery). 

One of the greatest benefits of Bixby is that it adapts to the user’s voice and manner of speaking. From the get-go, it can understand different request variations like “Show me today’s weather,” “What’s the weather like?” or “What’s the forecast for today?” and it only gets smarter with time.

4. Dragon

Powered by Nuance, which is the technology behind Siri, Dragon Mobile has been in operation for many years. Essential functionality includes dictating emails, checking traffic and weather, sharing your location, and a lot more. 

There are also many customizable features aimed at simplifying how you live, work, and spend leisure time – all while minimizing touch-based interactions. Users can add their unique and personalized Nuance Voiceprint. Then, voice biometrics will only let a designated user talk and ask questions.

You can also set your own wake-up word. Unlike other services, this one gives you options to launch it with “Hi, Dragon”, “What’s up,” or anything else you like. The company is working on adding languages other than English, as well as support for the international market. 

5. Hound

While the apps described above cover the most widely used basic functionalities, Hound takes a step further. Along with doing simple searches, it can accomplish advanced tasks such as hotel booking, a sing/hum music search, looking up stocks, or even calculating a mortgage. On a lighter side, you can play interactive games like Hangman. 

The company launched partnerships with Yelp and Uber to make features like getting restaurant information and hailing a ride more precise. Another interesting feature is that it can translate whole sentences practically in real-time. 

This speech-based app is only available for United States residents. However, the process of getting the app out of beta and ready for public consumption was pretty quick, so we may see some international development. Also, there are still occasional bugs within the app. 

6. Robin

Robin has been around for a while as one of the original “Siri alternatives”. Like its counterparts, the app supports calling, sending messages, and providing the latest information on the weather, news, and more. However, the functionality still needs some work.

Intentionally or not, a lot of features available on Robin are related to car use. For example, it offers GPS navigation, gives live traffic updates, and shows the prices for gas directly on the map. You can even specify what kind of gas you need, and it will guide towards the closest station.  

To call the app into action, you can tap on the microphone button, say “Robin,” or just wave hello twice in front of your phone (which is quite a unique innovation).

7. AIVC

AIVC stands for Artificial Intelligent Voice Control. It comes in two versions: free, which contains a number of ads, and Pro. The former option covers basic functionality, whereas the Pro one provides some appealing features like TV-Receiver control, wake up mode, and others. You can control devices that are accessible over a web interface with your own preset commands.

As far as voice commands go, the app gives you the option to define specific phrases to invoke a certain action. This is done to minimize the risk of the app not understanding what you want.

AIVC performs actions on other websites and services so you can compose emails, make Facebook posts, or move over to a navigation app.

8. DataBot

DataBot is one of the simpler Android Personal assistants. You can play around with it, ask for jokes and riddles, or do other goofy stuff, but it can actually be pretty useful for various tasks. You can ask the bot to make searches online, schedule events, and make calls by just using your voice.  

It is a cross-platform application so you can sync it across all your devices: smartphones, tablets, and laptops. That way, you get a coherent, all-around hands-free experience. Also, DataBot gains experience while you’re using it. 

A slight inconvenience that DataBot has is that it comes with ads and in-app purchases. If you aren’t bothered by that, it should be a good addition to your daily routine.

9. Car Dashdroid

Car Dashdroid includes everything you could possibly need while driving – navigation, music, contacts, messages, voice commands, and more. It is also integrated with popular messaging apps like WhatsApp, Telegram, and Facebook Messenger.

What makes this app stand out as a specifically car-oriented solution is that it comes with a compass, speedometer, and plenty of other features. 

There are also customization blocks that help you arrange all tasks based on their priority. For example, if you mostly use the app for navigation, you can put it at the top. Then, you can place music control below navigation, and the list of frequently contacted people at the bottom. 

10. Drivemode

Drivemode is a simple app meant to assist users while they’re driving. Users can select from their preferred navigation app (for example, Google Maps, Waze, and HERE Maps). You can also input favorite destinations (such as home, work, and so on), play music from multiple supported apps, and access messages in a low-distraction “driving mode” overlay with audio prompts. 

Even though it’s not entirely hands-free, there is a function that presents shortcuts that you can access through tapping or swiping. Drivemode can also be integrated with Google Assistant, so the functionality can potentially be extended way beyond driving assistance.

Top 10 Hands Free Apps 2019-2020 - 13.jpg

Integrating a Hands-Free Experience with Alan

Voice AI offers immense benefits for businesses – from completing tasks more quickly to offering better user experience with verbal communication. You can add unique voice conversations, no matter the industry you’re in. The Alan platform allows you to implement hands-free, interactive functionality in your existing application with ease. 

]]>
https://alan.app/blog/top-10-hands-free-apps-for-android-2020/feed/ 0 3360
What is a voice assistant? https://alan.app/blog/voiceassistant-2/ https://alan.app/blog/voiceassistant-2/#comments Fri, 25 Oct 2019 16:58:00 +0000 https://alan.app/blog/?p=2461 A voice assistant is a digital assistant that uses voice recognition, language processing algorithms, and voice synthesis to listen to specific voice commands and return relevant information or perform specific functions as requested by the user. Based on specific commands, sometimes called intents, spoken by the user, voice assistants can...]]>

A voice assistant is a digital assistant that uses voice recognition, language processing algorithms, and voice synthesis to listen to specific voice commands and return relevant information or perform specific functions as requested by the user.

Based on specific commands, sometimes called intents, spoken by the user, voice assistants can return relevant information by listening for specific keywords and filtering out the ambient noise.

While voice assistants can be completely software based and able to integrate into most devices, some assistants are designed specifically for single device applications, such as the Amazon Alexa Wall Clock. 

Today, voice assistants are integrated into many of the devices we use on a daily basis, such as cell phones, computers, and smart speakers. Because of their wide array of integrations, There are several voice assistants who offer a very specific feature set, while some choose to be open ended to help with almost any situation at hand.

History of voice assistants

Voice assistants have a very long history that actually goes back over 100 years, which might seem surprising as apps such as Siri have only been released within the past ten years.

The very first voice activated product was released in 1922 as Radio Rex. This toy was very simple, wherein a toy dog would stay inside a dog house until the user exclaimed its name, “Rex” at which point it would jump out of the house. This was all done by an electromagnet tuned to the frequency similar to the vowel found in the word Rex, and predated modern computers by over 20 years.

At the 1952 World’s fair, Audrey was announced by Bell Labs. The Automatic Digit Recognizer was not a small simple device however, its casing stood six feet tall just to house all the materials required to recognize ten numbers!

IBM began their long history of voice assistants in 1962 at the World’s Fair in Seattle when IBM Shoebox was announced. This device was able to recognize digits 0-9 and six simple commands such as, “plus, minus” so the device could be used as a simple calculator. Its name referred to its size, similar to the average shoebox, and contained a microphone connected to three audio filters to match the electric frequencies of what was being said and matched it with already assigned values for each digit.

Darpa then funded five years of speech recognition R&D in 1971, known as the Speech Understanding Research (SUR) Program. One of the biggest innovations to come out if this was Carnegie Mellon’s Harpy, which was capable of understanding over 1,000 words.

The next decade led to amazing progress and research in the speech recognition field, leading most voice recognition devices from understanding a few hundred words to understanding thousands, and slowly making their way into consumers homes.

Then, in 1990, Dragon Dictate was introduced to consumers homes for the shocking price of $9,000! This was the first consumer oriented speech recognition program designed for home PC’s. The user could dictate to the computer one word at a time, pausing in between each word waiting for the computer to process before they could move on. Seven years later, Dragon NaturallySpeaking was released and it brought more natural conversation, able to understand continuous speech at a maximum of 100 words per minute and a much lower price tag of $695.

In 1994, Simon by IBM was the first smart voice assistant. Simon was a PDA, and really, the first smartphone in history, considering it predates HTC’s Droid by practically 25 years!

In 2008, when Android was first released, Google had slowly started rolling out voice search for its Google mobile apps on various platforms, with a dedicated Google Voice Search Application being released in 2011. This led to more and more advanced features, eventually leading to Google now and Google Voice Assistant.

Then, this was followed by Siri in 2010. Developed by SRI International with speech recognition provided by Nuance Communications, the original app was released in 2010 on the iOS App Store and was acquired two months later by Apple. Then, with the release of the iPhone 4s, Siri was officially released as an integrated voice assistant within iOS. Since then, Siri has made its way to every Apple device available and has linked all the devices together in a  single ecosystem.  

Shortly after Siri was first developed, IBM Watson is announced publicly in 2011. Watson was named after the founder of IBM, and was originally conceived in 2006 to beat humans at a game of Jeopardy. Now, Watson is one of the most intelligent, naturally speaking computer systems available.

Amazon Alexa is then announced in 2015. It’s name being inspired by the Library of Alexandria and also the hard consonant “X” in the name, helping with more accurate voice recognition. With Alexa, the Echo line of smart devices are announced to bring smart integration to consumers homes for an inexpensive route.

Alan is finally publicly announced in 2017 to take the Enterprise Application world by storm. Being first born as “Synqq”, Alan is created by the minds behind “Qik”, the very first video messaging and conferencing mobile app. Alan is the first voice AI platform aimed at enterprise applications, so while it can be found in many consumer applications, it is designed for enterprises to be able to develop and integrate quickly and efficiently!

At the bottom of the post we’ve included a Timeline to summarize the history of voice assistants!

Technology behind Voice Assistants

Voice assistants use Artificial Intelligence and Voice recognition to accurately and efficiently deliver the result that the user is looking for. While it may seem simple to ask a computer to set a timer, the technology behind it is fascinating.

Voice Recognition

Voice recognition works by taking an analog signal from a users voice and turning it into a digital signal. After doing this, the computer takes the digital signal and attempts to match it up to words and phrases to recognize the users intent. To do this, the computer requires a database of pre-existing words and syllables in a given language to be able to closely match the digital signal with. Checking the input signal with this database is known as pattern recognition, and is the primary force behind voice recognition.

Artificial Intelligence

Artificial intelligence is using machines to simulate and replicate human intelligence.

In 1950, Alan Turing (The namesake of our company) published his paper “Computing Machinery and Intelligence” that first asked the question, can machines think? Alan Turing then went on to develop the Turing Test, a method of evaluating a computer to test its capability of thinking like a human. There were four approaches later developed that defined AI, Thinking humanly/rationally, and acting humanly/rationally. While the first two deal with reasoning, the second two deal with actual behavior. Modern AI is typically seen as a computer system designed to accomplish tasks that typically require human interaction. These systems can improve upon themselves using a process known as machine learning.

Machine Learning

Machine learning refers to the subset of Artificial Intelligence where programs are created without the use of human coders manually creating the program. Instead of writing out the complete program on their own, programmers gives the AI “patterns” to recognize and learn from and then gives the AI large amounts of data to sift through and study. So instead of having specific rules to abide by, the AI searches for patterns within this data and uses it to improve its already existing functions. One way machine learning can be helpful for Voice AI, is by feeding the algorithm hours of speech from various accents and dialects.

While traditional programs requires an input and rules to develop an output, machine learning tools are given an input and an output and use that to create the program itself. There are two approaches to machine learning, supervised learning and unsupervised learning. In supervised learning, the model is given data that is already partly labeled, this means some of the data given will be already tagged with the correct answer. This helps guide the model into categorizing the rest of the data and developing a correct algorithm. In unsupervised learning, none of the data is labeled, so it is up to the model to find the pattern correctly. One of the reasons this is very useful is because it allows the model to find patterns that the creators might have never found on their own, but the data is much more unpredictable.

Different Voice Assistant approaches

Many conversational assistants today combine both a task-oriented and knowledge-oriented workflow to carry out almost any task that a user can throw at it. A task-oriented workflow might include filling out a form, while a knowledge-oriented workflow includes answering what the capital of a state might be or specifying the technical specifications of a product.

Task-oriented approach

A task-oriented approach is using goals to tasks to achieve what the user needs. This approach often integrates itself with other apps to help complete tasks. For example, if you were to ask a voice assistant to set an alarm for 3PM, it would understand this to be a task request and communicate with your default Clock application to open and set an alarm for 3PM. It would then communicate with the app to see if anything else was necessary, such as a name for the alarm, then it would communicate this need back to you. This approach does not require an extensive online database, as it is mainly using the knowledge and already existing skills of other installed applications.

Knowledge-oriented approach

A knowledge-oriented approach is the use of analytical data to help users with their tasks. This approach focuses on using online databases and already recorded knowledge to help complete tasks. An example of this approach is anytime a user asks for an internet search, it will use the online databases available to return relevant results and recommend the highest search result. If someone is searching up a trivia question, this would use a knowledge-oriented approach as it is searching for data instead of working with other apps to complete tasks.

Benefits of Voice Assistants

Some examples of what a Voice Assistant can do include:

  • Check the weather
  • Turn on/off connected smart devices
  • Search databases

One of the main reasons of the growing popularity of Voice User Interfaces (VUI) is due to the growing complexity within mobile software without an increase in screen size, leading to a huge disadvantage by using a GUI (Graphical User Interface). As more iterations of phones come out, the screen sizes stay relatively the same, leading for very cramped interfaces and creating frustrating user experiences, which is why more and more developers are switching to Voice User Interfaces.

Efficiency and Safety

While typing has become much faster as people have gotten used to using standard keyboards, using your voice will always be quicker, much more natural, and lead to less spelling errors. This leads to a much more efficient and natural intelligent workflow.

Quick learning curve

One of the greatest benefits of voice assistants is a quick learning curve. Instead of having to learn how to use devices like mice and touch screens and get used to using specific physical devices, you can just use your natural conversation tendencies and use your voice.

Wider Device Integration

Since a screen or keyboard isn’t necessary, it’s easy to place voice integration into a much wider array of devices. In the future, smart glasses, furniture, appliances, will all come with voice assistants already integrated into the device.

Why and When to use Voice Assistants

There are many use cases for using a voice assistant in todays’ world. For example, when your hands are full and you are unable to use a touch screen or keyboard, or when you are driving Let’s say you are driving and you need to change your music, you could just ask a voice assistant, “play my driving playlist”. This leads to a safer driving experience, and helps avoid the risk of distracted driving.

User Interfaces

To further understand voice assistants, it is important to take a look at the overall user Experience and what a User Interface is and how a VUI differs from a more traditional graphical user Interface that modern apps currently use. 

Graphical User Interface (GUI)

A Graphical User Interface is what is most commonly used today. For example, the internet browser you’re using to read this article is a graphical user interface. Using graphical icons and visual indicators, the user is able to interact with machines quicker and easier than before.

A Graphical User Interface can be used in something like a chatbot, where the user communicates with the device over text, and the machine responds with natural conversation text. The big downside to this is since it is done all in text, it can seem cumbersome and inefficient, and can take longer than voice in certain situations.

Voice User Interface (VUI)

An example of a VUI is something like Siri, where there is an audio cue that the device is listening, followed by a verbal response.

Most apps today combine a sense of both Graphical and Voice User Interfaces. For example, when using a maps application, you can use voice to search for destinations and the application will show you the most relevant results, placing the most important information at the top of the screen.

Some examples of popular smart assistants today are Alan, Amazon Alexa, Siri by Apple, and Google Voice Assistant.

Popular Voice Assistants

Voice Assistant adoption by platform, from Voicebot.ai

Siri

Siri is the most popular voice assistant today. Created in 2010 by SRI Inc, and purchased in 2011 by Apple, Siri has quickly become an integral part of the Apple ecosystem in bringing all the Apple devices and applications together to use in tandem with one another.

Alexa

Created by Amazon in 2014, Alexa was named due to its similarity to the Library of Alexandria. Alexa was originally inspired by the conversational voice system found on board the U.S.S. Enterprise in Star Trek. Alexa was released alongside The Amazon Echo, a smart speaker intended for consumers to dive into the world of home automation, uses the Alexa platform to allow users to interact with the Amazon ecosystem and allow for a plethora of smart devices to be connected.

Google Assistant

Originally unveiled in 2016, Google Assistant was the spiritual successor of Google Now, with the main improvement being the addition of two-way conversations. Where Google now would return answers in the form of a search results page on Google, Google Assistant gives answers in the form of natural sentences and returns recommendations in the form of Feature cards.

Cortana

Beginning in 2009, Cortana by Microsoft has had one of the longest visions of giving people access to voice assistants in their daily lives. Microsoft began shipping Cortana with all Windows 10 and Xbox devices, leading to a huge increase in the amount of registered Cortana users. In 2018 it was reported that Cortana had over 800 Million users.

Alan

In 2017 Alan set out to take voice assistants to the next level, by enabling voice AI for all applications. Using domain specific language models and contextual understanding, Alan is focused on creating a new generation of Enterprise Voice AI applications. By using the Alan Platform, developers are able to take control of voice, and create an effective workflow that best fits their users with the help of vocal commands.

Future of Voice Assistants

As AI becomes more advanced and voice technology becomes more accepted, not only will voice controlled digital assistants become more natural, they will also become more integrated into more daily devices. Also, conversations will become much more natural, emulating human conversations, which will begin to introduce more complex task flows. More and more people are using voice assistants too, as it was estimated in early 2019 that 111.8 million people in the US will use a voice assistant at least monthly, up 9.5% from last year. 

Further Integration

In the future, devices will be more integrated with voice, and it will become easier and easier to search using voice. For example, Amazon has already released a wall clock that comes enabled with Amazon Alexa, so you can ask it to set a timer or tell you the time. While these devices aren’t full blown voice activated personal assistants, they still show a lot of promise in the coming years. Using vocal commands, we will be able to work with our devices just by talking.

Natural Conversations

Currently, as users are getting more used to using voice to communicate with their digital devices, conversations can seem very broken and awkward. But in the future, as digital processing becomes quicker and people become more accustomed to using voice assistants in their everyday devices, we will see a shift where users won’t have to pause and wait for the voice assistant to catch up, and instead we will be able to have natural conversations with our voice assistants, creating a more soothing and natural experience.

More complex task flows

As conversations with voice assistants become more natural and voice recognition and digital processing becomes quicker, it won’t be uncommon to see users begin to adopt more advanced tasks in their daily routines with voice assistants. For example, instead of asking a voice assistant how long a commute is, and then asking about different options, you might be more inclined to say, “If Uber is quicker than taking the bus to work, can you reserve an Uber ride from home to work, and how long will it take?”

How to make your own voice assistant

As the amount of voice assistants available publicly begin to grow, tools are beginning to appear to create your own to make it as easy as possible to find a voice assistant that fits your needs!

For example, if you just wanted to create a specific skill, or command for a voice assistant, it might be more efficient to look into integrating a skill into an already existing voice assistant, such as Alexa. 

Amazon has actually made it incredibly simple to add your own command to the vastly growing set of publicly available Alexa Skills. You can login to AWS with the same account you have an Echo linked to, and use the tools to create a free Alexa Skill! 

Using Alan Studio, the completely browser based Voice AI IDE, you can develop, test, and push voice integration straight from your browser.

Why Alan?

Alan is a highly customizable Voice AI platform designed to work with any pre-existing application. Built with enterprise use in mind, security and business functionality are a top priority. You can leverage visual and voice context to support any workflow and improve efficiency today, and since Alan is a completely browser based IDE, you can edit your scripts on the go whenever the need arises. Long gone are the days of creating multiple versions of scripts to run on each platform, with Alan, you can use a single script version and embed into any app, iOS, Android, or Web. You can sign up today for Alan Studio and see how you can create an AI voice assistant solution to improve your quality of life!

The Alan Voice AI Platform
Click the Alan button to learn more!

Voice Assistant Timeline

  • 1922 – First Voice activated consumer product hits store shelves as “Radio Rex”
  • 1952 – Audrey, or the Automatic Digit Recognition Machine, is announced
  • 1962 – IBM Shoebox is shown for the first time at the State Fair
  • 1971 – Darpa funds five years of speech recognition research and development
  • 1976 – Harpy is shown at Carnegie Mellon
  • 1984 – IBM releases “Tangora” the first voice activated typewriter
  • 1990 – Dragon Dictate is released
  • 1994 – Simon by IBM is the first modern voice assistant released
  • 2010 – Siri is released as an app on the iOS app store
  • 2011 – IBM Watson is released
  • 2012 – Google Now is released
  • 2014 – Amazon Alexa and Echo are released
  • 2015 – Microsoft Cortana is released
  • 2017 – Alan is developed and released with the Alan Platform
From Voicebot.ai

Resources

https://whatis.techtarget.com/definition/voice-assistant

https://www.smartsheet.com/voice-assistants-artificial-intelligence

https://www.ibm.com/ibm/history/ibm100/us/en/icons/speechreco

http://www.bbc.com/future/story/20170214-the-machines-that-learned-to-listen

https://towardsdatascience.com/build-your-first-voice-assistant-85a5a49f6cc1

This article was reposted at dev.to here:
https://dev.to/alanvoiceai/what-is-a-voice-assistant-492p

]]>
https://alan.app/blog/voiceassistant-2/feed/ 2 2461
What is a Voice User Interface (VUI)? https://alan.app/blog/voiceuserinterface/ https://alan.app/blog/voiceuserinterface/#comments Wed, 25 Sep 2019 16:56:00 +0000 http://alan.app/blog/?p=2369 What is voice-user interface (VUI)? VUI is a new form of user interface and artificial intelligence that has been rapidly advancing. ]]>

A Voice User Interface(VUI) enables users to interact with a device or application using spoken voice commands. VUIs give users complete control of technology hands free, often times without even having to look at the device. A combination of Artificial Intelligence(AI) technologies are used to build VUIs, including Automatic Speech Recognition, Name Entity Recognition, and Speech Synthesis among others. VUIs can also be contained either in devices or inside of applications. The backend infrastructure, including AI technologies used to create the VUI’s speech components, are often stored in a public or private cloud where the user’s speech is processed. In the cloud, AI components determine the intent of the user and return a given response back to the device or application where the user is interacting with the VUI.

Well known VUIs include Amazon Alexa, Apple Siri, Google Assistant, Samsung Bixby, Yandex Alisa, and Microsoft Cortana. For the best user experience, VUIs have visuals created by a Graphical User Interface and additional sound effects to accompany them. Each VUI today has its own way of handling sound effects are used so that users know when the VUI is active, listening, processing speech, or responding back to the user. The benefits of VUIs include hands-free accessibility, productivity, and better customer experience that will change how the world interacts with artificial intelligence. 

The Creation of VUI 

Audrey

The first traces of VUI started as the first speech recognition system in 1952 with a device called Audrey. Audrey was invented by K.H. Davis, R. Biddulph and S. Balashek, it was known as the “automatic digit recognizer” due to its ability to recognize numbers 0 through 9. Although Audrey’s skill was limited to numbers, it was seen as a technological breakthrough. Audrey was also not a small device like usually seen today, Audrey stood 6 feet tall with a large and rather complicated analog circuit system.

During the creation of Audrey there was an input and output procedure like used today in modern VUI devices. First, a speaker recited a digit or digits into a telephone and made sure to make a 350 milliseconds pause between each word. Next, Audrey listened to the speaker’s input and with speech processes it sorted the speech sounds and patterns to understand the input. Audrey would then visibly respond by flashing a light like modern VUI devices. 

Although Audrey could distinguish the numbers, Audrey could not universally understand everyone’s voice or language style and could only respond to a familiar speaker. Unfortunately this was not a feature like modern day VUI in devices, Audrey was simply not advanced enough and needed a familiar speaker to maintain a 97 percent digit recognition accuracy. With a few other designated speakers, Audrey’s accuracy was 70-80 percent, but far less with other speakers it was unfamiliar with. Why was Audrey created in the first place if manual push-button dialling was cheaper and easier to work with? Recognized speech requires less bandwidth (less frequencies for transmitting a signal) than the original sound waves in a telephone. It would also be more practical for reducing data traveling through wires and future technology. 

Tangora

Shortly after the creation of Audrey, the most significant voice technology advancement was in 1971 when the U.S Department of Defense’s research team funded five years of a Speech Understanding Research program. Their goal was to reach a minimum of 1,000 vocabulary words with the help of companies such as IBM. In the 1980s, IBM built a voice activated typewriter called Tangora. Tangora was capable of understanding and handling a 20,000-word vocabulary. Today voice activated typing systems have evolved to be used in smartphones to send a text or write a research paper in a matter of moments. 

Overtime, computer technology advanced VUI, Graphical User Interface (GUI), and User Experience (UX) design is placed into a small device that fits in the palm of a hand. Even GUI and UX is becoming old news due to the quick adoption of voice-only devices that no longer use these features. Speech recognition technology went from understanding 9 numbers to millions of phrases and words from any voice. This advancement was made possible with new speech recognition software processes such as Automatic Speech Recognition, Name Entity Recognition, and Speech Synthesis. 

Technology used to create a VUI

A range of Artificial Intelligence technologies are used to create VUIs, including Automatic Speech Recognition, Name Entity Recognition, and Speech Synthesis. 

Automatic Speech Recognition

Automatic Speech Recognition(ASR) is a technology used to analyze and process human speech into text. For a given audio input, ASR is required to filter out any distracting acoustic noises and identify human speech instead. Distortions in the audio and streaming connectivity can make this a challenge. Several underlying technologies have been tested and used to build ASR technology, including Gaussian mixture models (a probabilistic model) and deep learning with neural networks that process and distribute information to collect data. Often times, the words recognized by ASR are not an exact match to entities within a user intent. In these cases, augmented entity matching is used, which will take similar words or similar sounding words and match them to a predefined entity in the VUI.  

Name Entity Recognition

Name Entity Recognition(NER) is used to classify words as their underlying entity. For example, in the command “Get directions to New York City”, ‘New York City’ is recognized as a location. In addition to locations, NER locates entities or semi-structured text that can be a person, a subject, or something as specific as a scientific term. NER often takes surrounding text or words to determine the value of the entity. In the “Get directions to New York City” example, pre-trained probabilistic models assume that whatever word(s) come after “Get directions to” can be safely classified as a location. Examples like “Get directions to the nearest gas station” can also work for the same reasons, with ‘the nearest’ being a defined qualifier that precedes location.

NER assists ASR in resolving words as their entities. On the basis of voice input alone, “New York City” is recognized as “new” “York” “city”. NER then identifies this as a unique location and adjusts to “New York City”. NER is highly contextual and needs additional input to confidently determine entities. Sometimes, NER is reliant on previous training and will not be able to confidently determine an input’s entity. 

Speech Synthesis

Speech Synthesis produces artificial human voice and speech using input text. VUI does the job in three stages. The stages are input, processing, and output. Speech Synthesis is simply a text-to-speech (TTS) output where a device reads out loud what was input with a simulated voice through a loudspeaker.

 These AI technologies analyze, learn, and mimic human speech patterns and can also adjust the speech intonation, pitch, and cadence. Intonation is the way a person’s voice rises or falls as they speak. Factors that affect intonation is emotion, accent, and diction. Pitch is the tone of voice, but it is not affected by emotion. Pitch is high or low and can be best described as a squeaky or deep voice. Cadence is the flow of voice that fluctuates in pitch as someone is speaking or reading. For example, a public speaker will change their cadence by descending their voice during a declarative sentence to make an impact on their audience.

Once all of this information is stored and analyzed, these technologies will use it to improve itself and the VUI through what is called machine learning. The clouds and technologies will determine the intent of the user and return a response through the application or device.

Intents & Entities

Voice commands consist of intents and entities. The intent is the objective of the voice interaction and has two approaches. There are local intents and global intents. A local intent is when the user is asked a question in which they respond “Yes” or “No”. A global intent is when a user has a more complex answer. When designing VUI’s, the way different commands can be said need to be taken into consideration in order to recognize the intent and respond correctly. Here is an example of getting directions to a location: “Get directions to 1600 Pennsylvania Avenue”, “Take me to 1600 Pennsylvania Avenue”. Entities are variables within intents. Think of it as the blanks needed to fill into a Mad Libs booklet, such as “ Book a hotel in {location} on {date}” or “Play {song}.” 

Image result for someone speaking to siri

VUI vs GUI

User Experience (UX) is the overall experience of an interface product such as a website, application, and more in terms of how aesthetically pleasing it is or how easy it is to navigate for users. Together VUI and GUI play a large role in UX design because they assemble a product for consumers. 

Voice User Interface

As explained earlier, Voice User Interface (VUI) enables users to interact with a device or application using spoken voice commands. VUIs give users complete control of technology hands free, often times without even having to look at the device. 

Graphical User Interface (GUI)

Graphical User Interface (GUI) is graphical layout and design of a device. For example, the screen display and apps on a smartphone or computer is a graphical user interface. GUI can be used to display visuals for VUI, such as a graphic of sound waves when a voice assistant on a smartphone responds to its user. Another real life example can be how Google and Apple Siri use VUI and GUI together.

Apple Siri VUI & GUI

Apple Siri responds to “Hey Siri” using VUI or by pressing down on the home button of the Apple device. Users will know that Siri is active when Siri says “What can I help you with?” through its speaker or on the screen using GUI. While a user speaks to Siri, colorful representational wavelengths move to the sound of speech. This also shows users that Siri is actively listening and processing their question. When a user is quiet, Siri will prompt “Go ahead, I’m listening…” If a user still does not respond, then it will display on the screen “Some things you can ask me:” with a few examples of what it can do, such as calling, face timing, emailing, and more.

This GUI feature is specifically catered to people who are new to Siri and are unsure on what to do. The Apple device will also display what the user has asked and Siri’s response on the screen to show what is being understood from the interaction. Other features that Apple Siri has is the customization of Siri’s gender, accent, and language. 

Google Assistant VUI & GUI

Google Assistant responds to users when it hears “OK Google” or “Hey Google.” At the bottom of the screen, colorful dots will display to let the user know that Google Assistant has been activated and ready to listen. While it waits for the user to ask a question, the dots will move in a wave formation to represent wavelengths until it gets speech. Once a user starts speaking, the dots will transform into bars and move into a wave formation to the sound of speech to let users know it is processing information. Another GUI feature that Google Assistant has is that it will display what the user has asked and Google’s responses. Like Apple Siri, this display is another way of showing users what is being understood by the interaction. Google Assistant is also customizable in language and accent.

VUI vs Voice AI

The term Voice Artificial intelligence (AI) is used with VUI very commonly. Both terms usually get confused to mean the same thing since they are closely connected. VUI is all about the voice user experience on a device. Voice AI is the term for speech recognition technologies. The technologies fall under the Voice AI umbrella and are Automatic Speech Recognition, Name Entity Recognition, and Speech Synthesis. 

Different VUI approaches

Voice command devices also known as voice assistants use VUI and can be auditory, tactile, or visual. Devices can also range from a small sized speaker or to a blue light that blinks in a car’s stereo when it hears a command. More common examples of a voice command device are iPhone Siri, Alexa, and Google Home. These voice assistants are made to help people in daily tasks. There are also device genres for what the VUI is used for. This influences how the interaction between the user and device is set up.

VUI Device Genres

  • Smartphones
  • Wearables
    • Smart wrist watches
  • Stationary Connected Devices 
    • Desktop computers
    • Sound System
    • Smart TV
  • Non-Stationary Computing Devices
    • Laptops
    • Speakers
  • Internet of Things (IoT)
    • Thermostats
    • Locks 
    • Lights 

Each voice enabled device has a different functionality. A smart tv will respond to changing the channel, but not to sending a text message like a smartphone would. Users can ask for information from the news and weather channel or simply send a voice text with the power of VUI. Not only are there devices, but VUI integrated voice controlled apps that serve the same purpose as well. The VUI will interact with an app in a task-oriented workflow and/or knowledge-oriented workflow. Task-oriented workflows can complete almost anything a user asks it to do, such as setting an alarm or making a phone call. Knowledge-oriented workflows responds to its user by using secondary sources like the internet to complete a task, such as searching for a question about Mt. Everest’s height. 

The Benefits of VUIs

The primary benefit of VUIs is that they allow a hands-free experience that users can interact with while focusing on something else. It can save time in daily routines and improve people’s lives such as, checking the weather or setting an alarm clock the night before work. 

VUI in Workflows & Lifestyles

VUI is beneficial in multitasking productivity in work spaces that range from an office space or outdoor labor. Voice User Interface can actively participate in worker safety by assisting users in hazardous work flows, such as construction sites, oil refineries, driving, and more. Traditional devices like phones and computers aren’t the only devices connected to the internet or VUI. Smart light fixtures, thermostats, smart locks, and other Internet of Things (IoT) are connected as well. These VUI devices are useful in households with travelers and/or busy families from home or a smartphone.

Improving Lives

With individualized experiences, VUI can lead society to a more accessible world and help give a better quality of life. VUI benefits users with disabilities such as the visually impaired or others that cannot adapt to visual UI or keyboards. VUI is also becoming popular with Seniors who are new to technology. Aging has many effects on abilities such as sensory, movement, and memory, which makes VUI an alternative to hands-on assistance. With the assistance of VUI, elders can communicate with loved ones and use devices without the confusion and frustration. 

VUI in Education

Educational strategies are constantly being updated in educational systems for all ages. VUI can be a learning tool where classrooms interact with a voice assistant to create a new experience and cater to all learning styles. Since VUI is very accessible, training isn’t required for using it which makes it very easy to use in any audience. 

Technology Innovation

As VUI grows, it will change the way that products are designed and start a new job demand. VUI design will become a key skill for designers due to the evolving user experience. User Experience (UX) designers are trained in providing experiences for physical input and graphical output. VUI design is different from UX because the design guidelines and principles are different. This will encourage designers to focus more on VUI design. In 2019, it was estimated that 111.8 million people in the US will use a voice assistant at least monthly, up 9.5% from last year. Since users are using voice assistants more than ever, it will eventually become a habit and the new device feature that everyone will own.

It will be easier for users to speak to a device than to physically use a device after the habit has been formed. This will create a high demand for VUI knowledgeable designers and contribute to the change of how devices are designed.

Lastly, another benefit to voice command devices is that they don’t stay stagnant to what they are programmed to do. Over time, the interaction between the user and voice-user interface improves through machine learning as discussed earlier. The user learns how to better utilize the voice command device and the device in return learns how to work with its user. 

Solutions With Alan

With the Alan Platform, it is very simple to create your own voice interface designed for natural communication and conversation. Signing up for an account with Alan Studio gives you access to the complete Alan IDE to create a VUI you can integrate with any pre-existing app. The Alan Platform allows you to create a Voice User Interface completely within your browser and allows you to embed the code into any app, so you only have to write it once and not worry about compatibility issues.

Final Thoughts 

Voice User Interface went from only recognizing numbers 0-9 to more than a million vocabulary words in different styles of speaking. VUI has never stopped progressing and is creating a new job demand and an important focus in User Experience design. As VUI progresses, more voice assistants and solutions are being created to benefit society. Companies and consumers are switching to the new and practical trend of VUI or combining Graphical User Interface with VUI.

Voice assistants come in many shapes, forms, and genres. Each device has its own purpose using VUI, such as assisting in the productivity of workflows, lifestyles, and education. What they all have in common is that their purpose is to help users in their everyday lives with a hands free user experience. This is done by using a range of Artificial Intelligence technologies that are used to create VUIs, including Automatic Speech Recognition, Name Entity Recognition, and Speech Synthesis.

Another reason why VUI never stops growing and improving is because it does not stay stagnant to what it is programmed to do. Over time, the interaction between the user and voice user interface improves through machine learning. The user learns how to better utilize the voice command device and the device in return learns how to work with its user. Together they are working towards a more advanced artificial intelligence and voice user interface. 

This article was reposted at dev.to here:
https://dev.to/alanvoiceai/what-is-voice-ui-2ga7

]]>
https://alan.app/blog/voiceuserinterface/feed/ 1 2369