speech recognition – Alan AI Blog https://alan.app/blog/ Follow the most recent Generative AI articles Tue, 23 Jan 2024 07:30:44 +0000 en-US hourly 1 https://i0.wp.com/synqqblog.wpcomstaging.com/wp-content/uploads/2019/10/favicon-32x32.png?fit=32%2C32&ssl=1 speech recognition – Alan AI Blog https://alan.app/blog/ 32 32 111528672 Voice Interfaces for Apps: Guarding Your Privacy https://alan.app/blog/voice-interfaces-for-apps-guarding-user-privacy/ https://alan.app/blog/voice-interfaces-for-apps-guarding-user-privacy/#respond Tue, 31 May 2022 21:56:05 +0000 https://alan.app/blog/?p=5414 Decades ago, talking to a computer was only possible in advanced scientific labs or in science fiction stories. Today, voice assistants have become a reality of everyday life. People talk to their phone, smart speaker, doorbell, and even microwave oven. Voice is gradually becoming one of the main ways to...]]>

Decades ago, talking to a computer was only possible in advanced scientific labs or in science fiction stories. Today, voice assistants have become a reality of everyday life. People talk to their phone, smart speaker, doorbell, and even microwave oven. Voice is gradually becoming one of the main ways to interact with consumer applications and devices and the use of our natural language as a mode of interaction is extremely appealing. Software like text to speech (TTS), automatic speech recognition (ASR), and Spoken Language Understanding (SLU) are used to recognize and process human language.

But while we’ve seen a lot of progress in the application of voice interfaces like google assistant or Siri in consumer applications, the business sector still lags behind, even though enterprises can be the main beneficiaries of advances in speech recognition and overall interactive voice technology. Where workers are engaged in hands-on activities and can’t interact with graphical user interfaces, voice user interfaces can make a huge difference in user engagement, productivity, and safety. However, the enterprise voice sector must overcome several challenges, one of them being privacy and security concerns. Today’s consumer voice assistants are not renowned for being very privacy friendly. There have been several documented incidents of smart speakers and voice assistants mistakenly recording conversations and replaying them elsewhere. And the massive user data that these assistants collect gets sucked into the black hole of the data-hungry tech giants that run them.

The expansion of the voice interface to your living room, car, office, pocket, and wrist has created fierce competition between tech giants. Manufacturers of smartphones, smart speakers, wearable devices and other mobile devices aim to create the ultimate voice experience that can respond to every possible query, whether it’s asking the weather, turning on the lights, responding to emails, or setting timers. Currently, the only way speech api vendors can get ahead of competitors is to improve their AI models by expanding their repertoire of actionable voice commands. This puts them in a position to have a vested interest to collect more user data and assemble larger training datasets for their AI models.

What’s also worth noting is that all major consumer voice assistants are owned by companies that  have built their business on collecting user information and creating digital profiles to serve ads, provide content and product recommendations, and keep users locked in their apps. In this regard, voice interfaces become another window for these companies to collect more data and know more about their users.

This brings us to an important takeaway: Tech giants will do anything they can to own your data because that is their key differentiating factor.

From a security and privacy standpoint, this causes several key concerns:

– These intermediaries will get to hear private conversations of enterprises’ users. For instance, if you allow a consumer voice assistant to check on your bank balance, you’re giving them access to this sensitive information.

– You don’t know what kind of data is being collected and where it is stored.

– Data is stored centrally in the servers of the voice AI provider. And as numerous security incidents have shown, centralized stores of data are attractive targets for malicious actors.

– As an enterprise, you have no ownership or control of your data and can’t use it to improve your products or gain insights about how users interact with your applications.

– In case you’re handling sensitive health, financial, or business data, you’re at the mercy of the Voice AI vendor to keep your data safe and not share it with third parties.

On the other hand, the Alan Platform is designed to ensure security and privacy for users of the enterprises and organizations. The key privacy tenet of the Alan platform is that each enterprise is the sole owner of their user conversations data. They decide where it is stored and who has access to it. And regardless of a customer’s choice for where to store their data, Alan AI secures this data, making sure it’s encrypted in transit and at rest. Not only does this model create more value for businesses in comparison to the classic voice AI platform, but it also addresses the key privacy and security pain points that organizations face when considering voice interfaces for their applications.

The Alan platform is based on solving specific problems for each enterprise, not answering every possible query in the world. Each deployment of our AI system will be tuned for one or more applications of a single enterprise.

The value of the Alan Platform does not come from creating digital profiles and selling ads and products to users, therefore there’s no incentive to collect, hoard, and monetize user data. Instead, Alan AI seeks success by creating value and helping businesses reduce costs, improve operational efficiencies and safety with employee facing deployments, and increase revenue acceleration for the customer facing deployments.

The goal is to increase ROI for businesses by deploying voice interfaces for apps being used by their customers and employees. This is why Alan AI believes every company should have full control and ownership of their data and AI models to provide the required privacy for their users. An added benefit is that the AI of each customer will improve as it continues to interact with the users of its application, and the business will have a chance to glean actionable insights from its data and develop new features and products.

Having access to the right quality and amount of data can give an enterprise the edge in providing a higher quality voice interface. Therefore, every enterprise should put data ownership and security at the center of its product innovation strategy. Will you prefer to use the technology of a company that works behind a black box, taking control and ownership of your data and not providing clear safeguards, or do you prefer to be in control of your data and work in a secure environment where you can continuously innovate and improve the voice interface of your products? If you’re in the latter camp,  the Alan Platform is for you. At Alan, we believe the future is a human voice interface to app

Reach out to sales@alan.app to set up a free private demo of the platform or answer any questions that you may have about the technology.

]]>
https://alan.app/blog/voice-interfaces-for-apps-guarding-user-privacy/feed/ 0 5414
Intelligent Voice Interfaces: Higher Productivity in MRO https://alan.app/blog/voice-assistants-increase-productivity-for-mro-workers/ https://alan.app/blog/voice-assistants-increase-productivity-for-mro-workers/#respond Tue, 26 Apr 2022 22:52:10 +0000 https://alan.app/blog/?p=5266 Smart technology is changing the way work gets done, regardless of the industry. It enables simple requests and provides efficient services for various industries, including the maintenance, repair, and operations (MRO) industry.  The equipment in the MRO industry needs regular servicing. Industries such as aviation have particularly complex maintenance procedures. ...]]>

Smart technology is changing the way work gets done, regardless of the industry. It enables simple requests and provides efficient services for various industries, including the maintenance, repair, and operations (MRO) industry. 

The equipment in the MRO industry needs regular servicing. Industries such as aviation have particularly complex maintenance procedures.  Servicing them requires organized knowledge of the user guides and manuals. The procedures might be different for each unit, and it asks for patience, thoroughness, and the right set of skills– all in plenty. 

For example, finding the correct manual or the right procedure might not be easily possible, especially when you are strapped for time. These processes require the complete attention and focus of the technician or engineer. 

Now, how does having an intelligent vouce interface sound? What if it is your voice that can be used to request for information? Voice interfaces are ripe to go mainstream with advances in technology. The technician can say “Walk me through the inspection of X machine” to the voice assistant and get a guided workflow. They can get the work done in peace without wondering if they are following the right steps. 

Industry stats indicate that deploying voice interfaces in MRO apps result in a 2X increase in productivity, 50% reduction in unplanned downtime, and a significant 20% increase in revenue stream.

How Voice AI helps the maintenance, repair and operations industry: 

  1. Increases productivity:

When maintenance workers engage with handsfree apps, they are capable of accomplishing tasks faster and are presented with an opportunity to multitask. The overall business productivity will increase in leaps and bounds. Moreover, voice enables smoother and faster onboarding and gets the new employee to be productive in a shorter span of time.

2. Allows a wide range of MRO activities:

Voice interfaces have a device-based implementation- it allows workers to be distant and still be able to collect data or listen to guided workflows. It includes laptops, smartphones, tablets, and other smart devices that can install and run a mobile application. 

The ability to have a voice interface on these devices, regardless of the connectivity, allows voice enabled applications to fit a wide range of MRO deployments in the field. 

3. Provides detailed troubleshooting:

One more critical advantage of using voice interfaces in the MRO industry is how speech recognition provides detailed error messages. The voice assistant warns when the data being input falls out of ranges that are not acceptable. It can even pre-load information collected in the previous screens and provides detailed instructions for new screens. 

4. Allows for smoother operations:

Voice assistants seamlessly integrate responses within a maintenance or inspection procedure. It is capable of doing this while following the updated guidelines. The technical operator gets additional information during the complicated repair process. Since voice assistants can provide the information in the form of audio, there is no interruption. 

5. Eradicate language barriers:

Some technicians might not be fully versed with the language that the maintenance procedure handbook is written in. It can be a barrier in getting the work done properly. Doing maintenance work in a faulty manner without following the procedures exactly as it is can result in problems. Listening to the instructions via voice can ease the stress of trying to read and make sense and allow for better comprehension.

6. Immediate solutions:

When the operator uses an intelligent voice interface, they can simply ask for any of the information that is already fed in the voice assistant, and the corresponding content will be provided by it. You will get exactly what you asked for. It eliminates the need for manual search, thereby even reducing the time taken for the procedure. 

7. Better training opportunities: 

Apart from providing assistance to service personnel, the voice assistants can also act as a great training tool for new operators. The newly hired operators can learn to operate the machine while listening to audio that can be synchronized with  visual instructions from the voice assistants. 

Wrapping up:

The advantages of using voice assistants in the MRO industry are multiple. The flexibility and capability that voice assistants offer enables greater attention to work, helps focus on the job, and reduces the time that is usually wasted by moving between applications and errors. Give your workers an error free, productive and safer environment with intelligent voice assistants. 

If industrial enterprises are looking for a voice-based solution that will make operations safer and more effective, the Alan Platform is the right solution for you. Check out the Ramco Systems testimonial on their partnership with Alan AI for enterprise MRO software apps.

The team at Alan AI will be more than happy to assist you with any questions or provide a personalized demo of our intelligent voice assistant platform. Just email us at sales@alan.app

]]>
https://alan.app/blog/voice-assistants-increase-productivity-for-mro-workers/feed/ 0 5266
What is a Voice User Interface (VUI)? https://alan.app/blog/voiceuserinterface/ https://alan.app/blog/voiceuserinterface/#comments Wed, 25 Sep 2019 16:56:00 +0000 http://alan.app/blog/?p=2369 What is voice-user interface (VUI)? VUI is a new form of user interface and artificial intelligence that has been rapidly advancing. ]]>

A Voice User Interface(VUI) enables users to interact with a device or application using spoken voice commands. VUIs give users complete control of technology hands free, often times without even having to look at the device. A combination of Artificial Intelligence(AI) technologies are used to build VUIs, including Automatic Speech Recognition, Name Entity Recognition, and Speech Synthesis among others. VUIs can also be contained either in devices or inside of applications. The backend infrastructure, including AI technologies used to create the VUI’s speech components, are often stored in a public or private cloud where the user’s speech is processed. In the cloud, AI components determine the intent of the user and return a given response back to the device or application where the user is interacting with the VUI.

Well known VUIs include Amazon Alexa, Apple Siri, Google Assistant, Samsung Bixby, Yandex Alisa, and Microsoft Cortana. For the best user experience, VUIs have visuals created by a Graphical User Interface and additional sound effects to accompany them. Each VUI today has its own way of handling sound effects are used so that users know when the VUI is active, listening, processing speech, or responding back to the user. The benefits of VUIs include hands-free accessibility, productivity, and better customer experience that will change how the world interacts with artificial intelligence. 

The Creation of VUI 

Audrey

The first traces of VUI started as the first speech recognition system in 1952 with a device called Audrey. Audrey was invented by K.H. Davis, R. Biddulph and S. Balashek, it was known as the “automatic digit recognizer” due to its ability to recognize numbers 0 through 9. Although Audrey’s skill was limited to numbers, it was seen as a technological breakthrough. Audrey was also not a small device like usually seen today, Audrey stood 6 feet tall with a large and rather complicated analog circuit system.

During the creation of Audrey there was an input and output procedure like used today in modern VUI devices. First, a speaker recited a digit or digits into a telephone and made sure to make a 350 milliseconds pause between each word. Next, Audrey listened to the speaker’s input and with speech processes it sorted the speech sounds and patterns to understand the input. Audrey would then visibly respond by flashing a light like modern VUI devices. 

Although Audrey could distinguish the numbers, Audrey could not universally understand everyone’s voice or language style and could only respond to a familiar speaker. Unfortunately this was not a feature like modern day VUI in devices, Audrey was simply not advanced enough and needed a familiar speaker to maintain a 97 percent digit recognition accuracy. With a few other designated speakers, Audrey’s accuracy was 70-80 percent, but far less with other speakers it was unfamiliar with. Why was Audrey created in the first place if manual push-button dialling was cheaper and easier to work with? Recognized speech requires less bandwidth (less frequencies for transmitting a signal) than the original sound waves in a telephone. It would also be more practical for reducing data traveling through wires and future technology. 

Tangora

Shortly after the creation of Audrey, the most significant voice technology advancement was in 1971 when the U.S Department of Defense’s research team funded five years of a Speech Understanding Research program. Their goal was to reach a minimum of 1,000 vocabulary words with the help of companies such as IBM. In the 1980s, IBM built a voice activated typewriter called Tangora. Tangora was capable of understanding and handling a 20,000-word vocabulary. Today voice activated typing systems have evolved to be used in smartphones to send a text or write a research paper in a matter of moments. 

Overtime, computer technology advanced VUI, Graphical User Interface (GUI), and User Experience (UX) design is placed into a small device that fits in the palm of a hand. Even GUI and UX is becoming old news due to the quick adoption of voice-only devices that no longer use these features. Speech recognition technology went from understanding 9 numbers to millions of phrases and words from any voice. This advancement was made possible with new speech recognition software processes such as Automatic Speech Recognition, Name Entity Recognition, and Speech Synthesis. 

Technology used to create a VUI

A range of Artificial Intelligence technologies are used to create VUIs, including Automatic Speech Recognition, Name Entity Recognition, and Speech Synthesis. 

Automatic Speech Recognition

Automatic Speech Recognition(ASR) is a technology used to analyze and process human speech into text. For a given audio input, ASR is required to filter out any distracting acoustic noises and identify human speech instead. Distortions in the audio and streaming connectivity can make this a challenge. Several underlying technologies have been tested and used to build ASR technology, including Gaussian mixture models (a probabilistic model) and deep learning with neural networks that process and distribute information to collect data. Often times, the words recognized by ASR are not an exact match to entities within a user intent. In these cases, augmented entity matching is used, which will take similar words or similar sounding words and match them to a predefined entity in the VUI.  

Name Entity Recognition

Name Entity Recognition(NER) is used to classify words as their underlying entity. For example, in the command “Get directions to New York City”, ‘New York City’ is recognized as a location. In addition to locations, NER locates entities or semi-structured text that can be a person, a subject, or something as specific as a scientific term. NER often takes surrounding text or words to determine the value of the entity. In the “Get directions to New York City” example, pre-trained probabilistic models assume that whatever word(s) come after “Get directions to” can be safely classified as a location. Examples like “Get directions to the nearest gas station” can also work for the same reasons, with ‘the nearest’ being a defined qualifier that precedes location.

NER assists ASR in resolving words as their entities. On the basis of voice input alone, “New York City” is recognized as “new” “York” “city”. NER then identifies this as a unique location and adjusts to “New York City”. NER is highly contextual and needs additional input to confidently determine entities. Sometimes, NER is reliant on previous training and will not be able to confidently determine an input’s entity. 

Speech Synthesis

Speech Synthesis produces artificial human voice and speech using input text. VUI does the job in three stages. The stages are input, processing, and output. Speech Synthesis is simply a text-to-speech (TTS) output where a device reads out loud what was input with a simulated voice through a loudspeaker.

 These AI technologies analyze, learn, and mimic human speech patterns and can also adjust the speech intonation, pitch, and cadence. Intonation is the way a person’s voice rises or falls as they speak. Factors that affect intonation is emotion, accent, and diction. Pitch is the tone of voice, but it is not affected by emotion. Pitch is high or low and can be best described as a squeaky or deep voice. Cadence is the flow of voice that fluctuates in pitch as someone is speaking or reading. For example, a public speaker will change their cadence by descending their voice during a declarative sentence to make an impact on their audience.

Once all of this information is stored and analyzed, these technologies will use it to improve itself and the VUI through what is called machine learning. The clouds and technologies will determine the intent of the user and return a response through the application or device.

Intents & Entities

Voice commands consist of intents and entities. The intent is the objective of the voice interaction and has two approaches. There are local intents and global intents. A local intent is when the user is asked a question in which they respond “Yes” or “No”. A global intent is when a user has a more complex answer. When designing VUI’s, the way different commands can be said need to be taken into consideration in order to recognize the intent and respond correctly. Here is an example of getting directions to a location: “Get directions to 1600 Pennsylvania Avenue”, “Take me to 1600 Pennsylvania Avenue”. Entities are variables within intents. Think of it as the blanks needed to fill into a Mad Libs booklet, such as “ Book a hotel in {location} on {date}” or “Play {song}.” 

Image result for someone speaking to siri

VUI vs GUI

User Experience (UX) is the overall experience of an interface product such as a website, application, and more in terms of how aesthetically pleasing it is or how easy it is to navigate for users. Together VUI and GUI play a large role in UX design because they assemble a product for consumers. 

Voice User Interface

As explained earlier, Voice User Interface (VUI) enables users to interact with a device or application using spoken voice commands. VUIs give users complete control of technology hands free, often times without even having to look at the device. 

Graphical User Interface (GUI)

Graphical User Interface (GUI) is graphical layout and design of a device. For example, the screen display and apps on a smartphone or computer is a graphical user interface. GUI can be used to display visuals for VUI, such as a graphic of sound waves when a voice assistant on a smartphone responds to its user. Another real life example can be how Google and Apple Siri use VUI and GUI together.

Apple Siri VUI & GUI

Apple Siri responds to “Hey Siri” using VUI or by pressing down on the home button of the Apple device. Users will know that Siri is active when Siri says “What can I help you with?” through its speaker or on the screen using GUI. While a user speaks to Siri, colorful representational wavelengths move to the sound of speech. This also shows users that Siri is actively listening and processing their question. When a user is quiet, Siri will prompt “Go ahead, I’m listening…” If a user still does not respond, then it will display on the screen “Some things you can ask me:” with a few examples of what it can do, such as calling, face timing, emailing, and more.

This GUI feature is specifically catered to people who are new to Siri and are unsure on what to do. The Apple device will also display what the user has asked and Siri’s response on the screen to show what is being understood from the interaction. Other features that Apple Siri has is the customization of Siri’s gender, accent, and language. 

Google Assistant VUI & GUI

Google Assistant responds to users when it hears “OK Google” or “Hey Google.” At the bottom of the screen, colorful dots will display to let the user know that Google Assistant has been activated and ready to listen. While it waits for the user to ask a question, the dots will move in a wave formation to represent wavelengths until it gets speech. Once a user starts speaking, the dots will transform into bars and move into a wave formation to the sound of speech to let users know it is processing information. Another GUI feature that Google Assistant has is that it will display what the user has asked and Google’s responses. Like Apple Siri, this display is another way of showing users what is being understood by the interaction. Google Assistant is also customizable in language and accent.

VUI vs Voice AI

The term Voice Artificial intelligence (AI) is used with VUI very commonly. Both terms usually get confused to mean the same thing since they are closely connected. VUI is all about the voice user experience on a device. Voice AI is the term for speech recognition technologies. The technologies fall under the Voice AI umbrella and are Automatic Speech Recognition, Name Entity Recognition, and Speech Synthesis. 

Different VUI approaches

Voice command devices also known as voice assistants use VUI and can be auditory, tactile, or visual. Devices can also range from a small sized speaker or to a blue light that blinks in a car’s stereo when it hears a command. More common examples of a voice command device are iPhone Siri, Alexa, and Google Home. These voice assistants are made to help people in daily tasks. There are also device genres for what the VUI is used for. This influences how the interaction between the user and device is set up.

VUI Device Genres

  • Smartphones
  • Wearables
    • Smart wrist watches
  • Stationary Connected Devices 
    • Desktop computers
    • Sound System
    • Smart TV
  • Non-Stationary Computing Devices
    • Laptops
    • Speakers
  • Internet of Things (IoT)
    • Thermostats
    • Locks 
    • Lights 

Each voice enabled device has a different functionality. A smart tv will respond to changing the channel, but not to sending a text message like a smartphone would. Users can ask for information from the news and weather channel or simply send a voice text with the power of VUI. Not only are there devices, but VUI integrated voice controlled apps that serve the same purpose as well. The VUI will interact with an app in a task-oriented workflow and/or knowledge-oriented workflow. Task-oriented workflows can complete almost anything a user asks it to do, such as setting an alarm or making a phone call. Knowledge-oriented workflows responds to its user by using secondary sources like the internet to complete a task, such as searching for a question about Mt. Everest’s height. 

The Benefits of VUIs

The primary benefit of VUIs is that they allow a hands-free experience that users can interact with while focusing on something else. It can save time in daily routines and improve people’s lives such as, checking the weather or setting an alarm clock the night before work. 

VUI in Workflows & Lifestyles

VUI is beneficial in multitasking productivity in work spaces that range from an office space or outdoor labor. Voice User Interface can actively participate in worker safety by assisting users in hazardous work flows, such as construction sites, oil refineries, driving, and more. Traditional devices like phones and computers aren’t the only devices connected to the internet or VUI. Smart light fixtures, thermostats, smart locks, and other Internet of Things (IoT) are connected as well. These VUI devices are useful in households with travelers and/or busy families from home or a smartphone.

Improving Lives

With individualized experiences, VUI can lead society to a more accessible world and help give a better quality of life. VUI benefits users with disabilities such as the visually impaired or others that cannot adapt to visual UI or keyboards. VUI is also becoming popular with Seniors who are new to technology. Aging has many effects on abilities such as sensory, movement, and memory, which makes VUI an alternative to hands-on assistance. With the assistance of VUI, elders can communicate with loved ones and use devices without the confusion and frustration. 

VUI in Education

Educational strategies are constantly being updated in educational systems for all ages. VUI can be a learning tool where classrooms interact with a voice assistant to create a new experience and cater to all learning styles. Since VUI is very accessible, training isn’t required for using it which makes it very easy to use in any audience. 

Technology Innovation

As VUI grows, it will change the way that products are designed and start a new job demand. VUI design will become a key skill for designers due to the evolving user experience. User Experience (UX) designers are trained in providing experiences for physical input and graphical output. VUI design is different from UX because the design guidelines and principles are different. This will encourage designers to focus more on VUI design. In 2019, it was estimated that 111.8 million people in the US will use a voice assistant at least monthly, up 9.5% from last year. Since users are using voice assistants more than ever, it will eventually become a habit and the new device feature that everyone will own.

It will be easier for users to speak to a device than to physically use a device after the habit has been formed. This will create a high demand for VUI knowledgeable designers and contribute to the change of how devices are designed.

Lastly, another benefit to voice command devices is that they don’t stay stagnant to what they are programmed to do. Over time, the interaction between the user and voice-user interface improves through machine learning as discussed earlier. The user learns how to better utilize the voice command device and the device in return learns how to work with its user. 

Solutions With Alan

With the Alan Platform, it is very simple to create your own voice interface designed for natural communication and conversation. Signing up for an account with Alan Studio gives you access to the complete Alan IDE to create a VUI you can integrate with any pre-existing app. The Alan Platform allows you to create a Voice User Interface completely within your browser and allows you to embed the code into any app, so you only have to write it once and not worry about compatibility issues.

Final Thoughts 

Voice User Interface went from only recognizing numbers 0-9 to more than a million vocabulary words in different styles of speaking. VUI has never stopped progressing and is creating a new job demand and an important focus in User Experience design. As VUI progresses, more voice assistants and solutions are being created to benefit society. Companies and consumers are switching to the new and practical trend of VUI or combining Graphical User Interface with VUI.

Voice assistants come in many shapes, forms, and genres. Each device has its own purpose using VUI, such as assisting in the productivity of workflows, lifestyles, and education. What they all have in common is that their purpose is to help users in their everyday lives with a hands free user experience. This is done by using a range of Artificial Intelligence technologies that are used to create VUIs, including Automatic Speech Recognition, Name Entity Recognition, and Speech Synthesis.

Another reason why VUI never stops growing and improving is because it does not stay stagnant to what it is programmed to do. Over time, the interaction between the user and voice user interface improves through machine learning. The user learns how to better utilize the voice command device and the device in return learns how to work with its user. Together they are working towards a more advanced artificial intelligence and voice user interface. 

This article was reposted at dev.to here:
https://dev.to/alanvoiceai/what-is-voice-ui-2ga7

]]>
https://alan.app/blog/voiceuserinterface/feed/ 1 2369