About me

My name is Marko Markovic and I am a software developer, a musician and a researcher. I am intrigued by algorithms and new techniques and I want to improve my software development skills.

While working on different projects I like to document what I have learned. A new technique, a new algorithm, solution to a problem or a tool. The reason I am writing this blog is to help out other people in their programming endeavors and also I write so I won't forget.

If you are interested to find out more about me visit markomarks.com

Friday, May 13, 2016

HelloTextToSpeech

Introduction

In today's session I will try to teach you how to add Text-to-speech support in your apps. What are we creating today? We will be creating a portable and light TextToSpeechManager class that will be easy to add to your projects. Ever since I held the public talk at the April Android Developer Meetup on this topic I wanted to do a follow up in a form of a blog. The full code source can be found on my github, feel free to fork it and play around with it. The slides from the presentation can be found here.

Text-To-Speech (TTS) also known as "speech synthesis" which enables your android device to "speak" the text that you give it. It supports a wide range of languages, but not all Android devices have them.

Setting up

Well, what are the requirements you might wonder? We don't need any special libraries or installations. Your typical setup for android development will work just fine.

If you want to change the quality and language of the voice you will need to go to your language and input settings on your Android device. After that, there should be a Text-to-speech options settings. There you will be presented with a list of TTS engines that are on your device.

Additionally, you have settings for the engine itself. You want a male voice? Well, if you have Google's version of the engine that might be an option now (some of you may remember my old post related to my comments on the gender). You will have to download all the sound packs separately.

Today, there are too many different TTS engines. I noticed that there are 2 Google TTS engines: Pico and Google Text-to-speech engine. Pico is simplified and has a limited number of languages that it can use, while the Google Text-to-speech has many more options to choose from. Trying to maintain a lot of different devices with different engines can be a daunting task. Probably you will be surprised that there are some devices that don't support TTS. When you are working with the TTS engine, you have to take into account that the user will not have the needed voice packs, languages or the specific engine. Additionally, you have an option to send out commands for a specific TTS engine in the speak method. Try not to limit the user because of it.

Let's get coding

This will be easy as 1,2,3,4, because there our class has 4 methods, you know, like easy as 1,2,3?...get it...

...wink, wink...

...Bah! Everyone is a critic these days.


public class TextToSpeechManager implements TextToSpeech.OnInitListener {

    private Context ctx;
    private TextToSpeech myTTS;
    private TextToSpeechUtteranceListener textToSpeechUtteranceListener;
    public static int MY_TTS_CHECK_CODE = 0;

    public TextToSpeechManager(Context context){}

    @Override
    public void onInit(int initStatus){}

    public void onDestroy(){}

    public void speak(String text, String utteranceId){}

    public void stopSpeaking(){}
}

Our class has to implement the TextToSpeech.OnInitListener. When the TTS engine is initialized and created it will call our onInit method. At that moment you will be able to create an instance of the TextToSpeech engine that you will use.

Besides the constructor we have 4 important methods:
  1. onInit() - creates the TTS instance and checks if the Android device supports TTS
  2. speak() - performs the speech. We are sending it the text we would like to hear and the utteranceId.
  3. stopSpeaking() - we are telling TTS to stop speaking
  4. onDestroy() - once we are done using the TTS it would be nice to cleanup so our app wouldn't blow up our blow up some other app.

Constructor

    public TextToSpeechManager(Context context){
        ctx = context;
        textToSpeechUtteranceListener = new TextToSpeechUtteranceListener(ctx);
    }

Like most of the classes we need to set some things up in the constructor. We need to pass the context as the parameter, because our TTS engine will need it and we will need it to create a new instance of the UtteranceListener. I will explain the utterance listener in a few moments.

onInit method()

  public void onInit(int initStatus){
        if(myTTS == null)
        {
            myTTS = new TextToSpeech(ctx, this);
            myTTS.setSpeechRate(1.0f);
            myTTS.setPitch(2.0f);
            myTTS.setOnUtteranceProgressListener(textToSpeechUtteranceListener);
        }
        if (initStatus == TextToSpeech.SUCCESS) {
            Locale currentLocale = ctx.getResources().getConfiguration().locale;s
            if(myTTS.isLanguageAvailable(Locale.US)==TextToSpeech.LANG_AVAILABLE && currentLocale == Locale.US){
                myTTS.setLanguage(Locale.US);
            }
            else if(myTTS.isLanguageAvailable(Locale.UK)==TextToSpeech.LANG_AVAILABLE && currentLocale == Locale.UK){
                myTTS.setLanguage(Locale.UK);
            }
            else if(myTTS.isLanguageAvailable(Locale.ENGLISH)==TextToSpeech.LANG_AVAILABLE && currentLocale == Locale.ENGLISH){
                myTTS.setLanguage(Locale.ENGLISH);
            }
            else{
                //Initializing text to voice
                Intent checkTTSIntent = new Intent();
                checkTTSIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
                ((Activity)ctx).startActivityForResult(checkTTSIntent, MY_TTS_CHECK_CODE);
            }
        }
        else if (initStatus == TextToSpeech.ERROR) {
            Toast.makeText(ctx, "Text To Speech init failed...", Toast.LENGTH_LONG).show();
        }
    }  

Here we initialize the TextToSpeech instance and apply different options.


setSpeechRate sets the speech speed with values ranging from 0.0 being the slowest to 2.0 being the fastest speech speed.

setPitch method sets the overall tone quality of the voice, the values also are ranging from 0.0 to 2.0 where makes the voice sound deep while 2.0 makes the voice sound really high. One example of using the setPitch method would be to create a male and female voice, one with lower pitch and the other one with a higher pitch. Personally I would use this, because it sounds a bit weird but it is good enough to do the job.

setOnUtteranceProgressListener adds our custom UterranceProgressListener class. We will talk about UtteranceProgressListener soon, I know I said in a few moments, I promise we will get to it.

Sometimes I get carried away, I know.

The rest of the code in the onInit method checks the available language (isLanguageAvailable) for the TTS compared to the current language that was set as default and sets it accordingly (setLanguage). Yes, it seems like a bit of an overkill, but I wanted to show you some options that you have available. You can play around here and create specific language implementations. In case we haven't found the required language we are launching a new intent that will require from the user to install the missing language packs.

    protected void onActivityResult(int requestCode, int resultCode, Intent data)
    {
        if (requestCode == TextToSpeechManager.MY_TTS_CHECK_CODE) {
            if (resultCode == TextToSpeech.Engine.CHECK_VOICE_DATA_PASS) {
                // success, create the TTS instance
                ttsManager.initializeTTS();
            } else {
                // missing data, install it
                Intent installIntent = new Intent();
                installIntent.setAction(TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA);
                startActivity(installIntent);
            }
        }
    }

In your activity add the above method. Once the user has finished installing the necessary data he/she is being returned to the main activity, not to our manager class. This gives you an option to initialize the TextToSpeech once again. ttsManager.initializeTTS() method is same in implementation as the onInit method in our TextToSpeech manager class. It just creates a new instance of the TextToSpeech, you can check for the languages again if you want to be absolutely sure.

speak() method

There are 2 ways for the TTS to speak.
  1. speak() method
  2. synthesizeToFile() method
speak() method is asynchronous, meaning as soon as you call the method the device will start speaking and it will continue speaking in the background, while it leaves your main thread free to do some other operations.

synthesizeToFile() records the output of the TTS to an audio file. If you are interested in a more permanent solution this would be your choice. In order to perform it you would have to use your media player, or call the media player from your app.

    public void speak(String text, String utteranceId)
    {
        myTTS.speak(text, TextToSpeech.QUEUE_ADD, null, utteranceId);
    }

the speak() method has 4 parameters: final CharSequence text, final int queueMode, final Bundle params, final String utteranceId
  • text - CharSequence: The string of text to be spoken. It should be no longer than getMaxSpeechInputLength() characters.
  • queueMode - int: The queuing strategy to use, QUEUE_ADD or QUEUE_FLUSH.
  • params - Bundle: Represents parameters for this specific request. It can be null. Supported parameter names: KEY_PARAM_STREAM, KEY_PARAM_VOLUME, KEY_PARAM_PAN. Engine specific parameters may be passed in but the parameter keys must be prefixed by the name of the engine they are intended for. For example the keys "com.svox.pico_foo" and "com.svox.pico:bar" will be passed to the engine named "com.svox.pico" if it is being used.
  • utteranceId String: An unique identifier for this request.

The QueueMode

  • QUEUE_ADD - Queue mode where the new entry is added at the end of the playback queue.
  • QUEUE_FLUSH - Queue mode where all entries in the playback queue (media to be played and text to be synthesized) are dropped and replaced by the new entry.

UtteranceProgressListener

Yes, I know, finally. Each utterance from the TTS engine is associated with a call to speak() or synthesizeToFile(). Simply put, every time a phrase has started or stopped speaking the utterance progress listener is called. Remember the utteranceId from earlier? It is sent as a parameter to its methods.

What can you do after? You might use the LocalBroadcastManager to send out the signal to one of your Activities. What is the purpose of this? Well you can use it to update your view and show the user the current utterance, you can move up down in the list of phrases that the TTS is currently speaking, save the current phrase so the user can continue when he/she returns to your app.

// Each utterance is associated with a call to speak(CharSequence, int, Bundle, String) or synthesizeToFile(CharSequence, Bundle, File, String)
public class TextToSpeechUtteranceListener extends UtteranceProgressListener {
    Context context;
    public TextToSpeechUtteranceListener(Context ctx)
    {
        context = ctx;
    }

    @Override
    public void onStart(String utteranceId) {
        // TODO Auto-generated method stub
        Log.e("HelloTextToSpeech", "Text To Speech Started ->" + utteranceId);
    }

    @Override
    public void onDone(String utteranceId) {
        // TODO Auto-generated method stub
        Log.e("HelloTextToSpeech", "Text To Speech Done -> " + utteranceId);
    }

    @Override
    public void onError(String utteranceId) {
        // TODO Auto-generated method stub
        Log.e("HelloTextToSpeech", "Text To Speech ERROR -> " + utteranceId);
    }

}
I think the implementation is self explanatory. There are three methods that are called on start,end and on error of the uterrance.

stopSpeaking() method

    public void stopSpeaking()
    {
        myTTS.stop();
    }

onDestroy() method

    public void onDestroy()
    {
        if(myTTS != null)
        {
            myTTS.stop();
            myTTS.shutdown();
        }
    }

Conclusion

You might be thinking that Text to speech has a limited number of uses, but believe me, when you think about it might take your app to new levels. My favorite TTS feature in an app, which I use quite often, is in an app called Sports Tracker. When I am riding my bicycle I love it when I get an voice update of my current mileage, speed and time. 

If you made it this far, I would like to thank you for reading this and I hope this article is useful to you. It really means a lot to me. If you have any questions, comments and complaints feel free to use the comment section.

And as a final thought, don't mix water and plugged in appliances, it will shock you.
Until next time, Take care!

No comments:

Post a Comment

Blog Archive