<input> Hi Sara, how are you?
(prompt / response) All systems operational!
<input> How is the weather in (where are we)?
(prompt / response) The current weather in Amsterdam, Netherlands is:
(weatherdetails)
<input> _

What is Sara
Requirements
NPM modules
How to use
Internal Commands
5.1 Colors
5.2 Verbose
5.3 Help
5.4 Hearing
5.5 Voice
5.6 Vision
Regular Expression matches
Layered commands
Plugins
Provided plugins
9.1 Math
9.2 Conversation
9.3 Location
9.4 Weather
9.5 XMBC remote
Audio in/out issues
Other issues
11.1 Sonus/Google Cloud Speech API
11.2 Known
Todo
Long term goals
Credits
Apologies

Attention: this package is currently a work in progress
Do not install via npm install @ztik.nl/sara
Clone or download from Sara @ Github

Github documentation will be the Current/Latest testing build
NPM will be pushed occasionally when there shouldn't be any app-breaking bugs
Many changes are to be expected, do not expect backwards compatibility

Current version: 0.2.2
When the core program is more complete I will start semantic version 1.0.0

What is Sara:

Sara is a command prompt, that listens for keyboard input or voice commands
Sara has a voice, and is able to respond to commands through text as well as audio

Sara is my (poor) attempt at making my own Jarvis/Alexa/Hey Google/Hi Bixby/Voice Response System
It runs in Node.js on a Raspberry Pi 3B, but should be able to run on earlier versions as well as other linux distro
It has some internal commands, but can be extended through a self-made plugin system

Hearing works
Voice commands can be sent to the command line for editing, or immediately be processed without user intervention
This option selection is currently hidden away in hearing.js, but will be in the commandline arguments and config.json soon

Voice works
Voice output works, but further testing is required
~~On very long output, the speed appears to slow down, and different voices would be nice...~~
Different voices (male and female) are now possible, soon there will be an option to select, as well as a way to display a list of voices for each language!

Vision works
All it does is take a picture every 30 minutes using a USB webcam
Pi camera not supported yet, will be supported later
There are object/face detection functions, as well as some other functions (age/expression/gender labeling) but NONE of these functions are connected to the webcam source image yet!
There are NO object/face recognition functions at this moment, but this will be added soon

Sara ignores the following words at sentence start:

sara
can you
will you
would you
could you
tell me
let me know
please

Sara also ignores the word please and the ? character at the end of commands
After stripping these words, the command is compared to internal commands, and if it doesnt match, it will be compared to a regex string contained in every plugin .json file

Sara listens to the keyword 'Sara'

Requirements:

Hardware:

A Raspberry Pi (3B tested, older models should work)
- Keyboard or ssh connection
- Microphone for voice commands (I use a G11 Touch Induction/Haobosou ~20 euro, excellent results)
- Audio output device (tv/hdmi or speakers on line-out)
- Webcam for future object/face recognition modules (I use a HP Webcam HD-4110)
- SD Card containing Raspbian (latest version is always advisable)
  - Self-powered USBhub is advisable when using USB microphone/webcam Software:
Node.js LTS or newest (I am currently running 12.5.0)
NPM (I am currently running 6.9.0)
aplay and arecord (config audio in/out as default audio devices first)
sudo apt-get install alsa-utils
fswebcam (i installed it, didnt touch a single config file)
apt-get install fswebcam
Other:
Google Cloud API key (one key to rule them all!)
This is free for a certain amount of requests, see Sonus/Google Cloud Speech//Vision API for more details
The same key is used for speech recognition, generating voices and face/object detection
Face recognition will be calculated in-app, so it will not make requests to the Google Cloud Vision API

NPM modules:

"@google-cloud/text-to-speech": "^1.1.2",
"chalk": "^2.4.2",
"country-list": "^2.1.1",
"decimal.js": "^10.2.0",
"geoip-lite": "^1.3.7",
"node-webcam": "^0.5.0",
"play-sound": "^1.1.3",
"public-ip": "^3.1.0",
"sonus": "^1.0.3",
"weather-js2": "^2.0.2"

How to use:

clone or download this repo
inside main folder containing bin.js & package.json, run command: npm install
in folder resources/apikeys/googlespeech.json, add your own Google Cloud Speech API key

start program with command: node bin.js
to see the (optional) command line arguments, start program with command: node bin.js --help
it is also possible to use a config.json file to force default behaviour

For more information on the Google Cloud Speech API, see:
NPMJS.com/sonus/usage & NPMJS.com/sonus/how-do-i-set-up-google-cloud-speech-api
The Google API key file is located at ./resources/apikeys/googlecloud.json

For more information on how to setup your own custom hotword, see:
NPMJS.com/sonus/usage & NPMJS.com/sonus/how-do-i-make-my-own-hotword
The custom hotword file is located at ./resources/speechrecognition/Sarah.pmdl

Internal commands:

I have tried to keep everything modular, so if something doesn't work on your system, you can disable that function through commandline arguments, config.json options file, or in the app itself
The vision command will be extended with object/face recognition, ~~if I can~~ when I get that to work properly

Colors:

start/stop colors turns on/off colored responses/prompt

Verbose:

start/stop verbose turns on/off verbose mode
Verbose mode will turn on display of output with a 'data' or 'warn' type

Help:

help displays the main 'help' section
help <topic> displays help on the topic requested (still needs to be populated)
add help fill in the form and a new help topic is born!
edit help <topic> find an error in a certain help topic, you can fix it.

Hearing:

start/stop listening turns on/off speech recognition
start/stop hearing same as above

Voice:

start/stop voice turns on/off text-to-speech
start/stop talking same as above
silence stop speaking the current sentence/item

Vision:

start/stop vision turns on/off timer (15sec) for webcam snapshot to ./resources/vision/frame.png
start/stop watching same as above
Nothing is done with this image at this time, but there are tests being done with detection and recognition...

Face/object detection works, but is not connected yet, it will be soon after some more testing
Face recognition does not work yet, this will need a more complex neural net to connect the dots between different images

Regular Expression matches:

Sara needs to 'understand' commands, and does this by comparing input to a regular expression found inside each plugin function's .json file

Example:

/^(?:what|how\smuch)?\s?(?:is)?\s?(-?[0-9]+\.?(?:[0-9]+)?)\s?(?:\+|plus|\&|and)\s?(-?[0-9]+\.?(?:[0-9]+)?)\s?(?:is)?$/i

This regular expression matches the following sentences:

what is (-)10(.12) plus/and/+/& (-)10(.12)
what (-)10(.12) plus/and/+/& (-)10(.12) is
how much is (-)10(.12) plus/and/+/& (-)10(.12)
how much (-)10(.12) plus/and/+/& (-)10(.12) is
(-)10(.12) plus/and/+/& (-)10(.12) is
(-)10(.12) plus/and/+/& (-)10(.12)

Because Sara strips starting input, this allows to recognize sentences such as:

Sara can you please tell me what 10 + -9 is?

In the above regex line. most groups are not captured (?:xxx)
The capture fields (-?[0-9]+.?(?:[0-9]+)?) grabs these values and push them back to math.js which includes the function for processing these values
In the above example, math.js will receive an array object containing 3(!) items:
[0] the complete input string, in case the plugin still requires this string.
[1] the first captured group
[2] the second captured group

Therefore, the function math.add will receive these 3 array items, and return the calculation of add x[1] + x[2]
x[0] is always the entire matching regex string
Using the input sentence above, then:

x[0] == "what 10 + -9 is"
x[1] == 10
x[2] == -9

Layered commands:

(I am not a native English speaker, and I am not certain this is the correct term)
Sara is able to process subcommands through the use of parenthesis encapsulation
Example:

Sara can you tell me how much is 9 + (10 + 16)?

In this example, Sara will calculate 10 + 16 first, then calculate 9 + 26 afterwards

You can layer as many commands as you need, they will be processed starting with the most outer subcommand first:

11 + (7 + (root of 9))

subcmd: root of 9 = 3
subcmd: 7 + 3 = 10
finalcmd: 11 + 10 = 21

Plugins:

These are created using (at least) 2 files:

pluginname_function.json  
pluginname.js

The .js file contains all the javascript to deal with request X and push back a result
The .json file contains the name of the plugin, the name of the module (the .js file name), a Regular Expression string, and a small description

One .js file can contain multiple module.exports functions, each function requires its own .json file
Example:

math.js  
math_add.json  
math_subtract.json  
math_root.json

Regular Expressions in these .json files need special characters to be escaped twice:
"regex": "/^(?:what|how\\smuch)?\\s?(?:is)?\\s?(-?[0-9]+\\.?(?:[0-9]+)?)\\s?(?:\\+|plus|\\&|and)\\s?(-?[0-9]+\\.?(?:[0-9]+)?)\\s?(?:is)?$/i",

Since Sara removes certain words from the start of the sentence, all that the regex requires is the intent and if variables need to be passed to the function, one or more working capture groups

Provided plugins:

Math:

what is 7 + 9
10 - 3.3
9 * 4
4 divided by 3
how much is 12 squared
root of 10
what 10³ is

Conversation:

hi
hello
hey
yo
good morning/afternoon/evening/night

how are you
how are you doing
how are you feeling
how are you doing today
how are you feeling at the moment

Location:

where am I
where are you
what city are we in
what time zone are we in
in which province are we
what are your actual coordinates
Which country is this

Weather:

weather  
how is the weather  
how is the weather in/around/near <place>  
what is the weather like in/around/near <place>  
weather forecast  
what is the weather forecast  
what is the weather forecast for <place>

XMBC remote

I am still thinking about a good keyword for menu move functions, media(?)

stop video/movie/film/playback/episode
stop the video/movie/film/playback/episode
stop this video/movie/film/playback/episode
pause/pause video/movie/film/playback/episode
resume the video/movie/film/playback/episode
continue this video/movie/film/playback/episode
menu select
menu back
menu move up/down/left/right
menu move up/down/left/right 5x
menu move up/down/left/right 5*
menu move up/down/left/right 5 times
menu move up/down/left/right 5 entries
menu move up/down/left/right 1 entry
menu home
menu info/information
menu context
menu submenu

More coming...
(all these plugins are incomplete, and will be finished soon)

Audio in/out issues:

The only advise I can give is to make sure that alsa has the correct in/output device registered
My Raspberry Pi config:

ztik@sara:~/ $ arecord -l
**** List of CAPTURE Hardware Devices ****
>>> card 0: haobosou [haobosou], device 0: USB Audio [USB Audio] <<<
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 1: HD4110 [HP Webcam HD-4110], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

I use card 0, device 0 for my audio in (haobosou microphone, cheap and great quality audio)

ztik@sara:~/ $ aplay -l
**** List of PLAYBACK Hardware Devices ****
card 2: ALSA [bcm2835 ALSA], device 0: bcm2835 ALSA [bcm2835 ALSA]
  Subdevices: 7/7
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
>>> card 2: ALSA [bcm2835 ALSA], device 1: bcm2835 IEC958/HDMI [bcm2835 IEC958/HDMI] <<<
  Subdevices: 1/1
  Subdevice #0: subdevice #0

I use the HDMI output on my raspi for audio out, so I am using card 2, device 1 here

My config file:

ztik@sara:~/ $ cat ~/.asoundrc
pcm.!default {
  type asym
  playback.pcm {
    type plug
    slave.pcm "hw:2,1"
  }
  capture.pcm {
    type plug
    slave.pcm "hw:0,0"
  }
}

This solved every issue I had with aplay and arecord
Using these settings I am able to record from the proper input device with the following command:

arecord -d 10 test.wav

and play that recording using:

aplay test.wav

Anything on support beyond this should be requested at alsa/linux forums I guess, feel free to ask, but don't expect an answer...

Other issues:

Sonus/Google Cloud Speech API

I understand people can have problems getting through this, so here is a small guide (thanks to smart-mirror.io)

Setting up Speech Recognition
Sara uses Sonus with Google Cloud Speech for keyword spotting and recognition.
To set that up, you'll need to create a new project in the Cloud Platform Console:
In the Cloud Platform Console, go to the Projects page and select or create a new project
GO TO THE PROJECTS PAGE
Enable billing for your project.
ENABLE BILLING
Enable the Cloud Speech API.
ENABLE THE API - For more info see Cloud Speech API Pricing (for simple use it should be free)
Create a new JSON service account key, edit it with a text editor and copy the contents to ./resources/apikeys/googlespeech.json
When prompted to create a new service account select 'Owner' or 'Project Owner'

As I understand, 90% of problems with Sonus are related to billing issues in Google Cloud

Known:

The vision module works, but all it does is take a picture every 30 min, no further processing connected at this moment

Todo:

Long term goals:

Language support... eventually (this is depending on my personal skills as well as Google Speech and Text-To-Speech language availability)
Devise a way to incorporate a mood-function, simulate emotions
Connect a LCD/TFT screen, give Sara a face with expressions
Neural Net / Machine learning capabilities for influencing stock market
Build datacenter deep underground, preferably a remote island close to a submarine communications cable
Self awareness

Credits:

I would like to point out that I simply put this hardware and these programs and modules together, but without the people who created those, I would have had nothing at all!
Thank you to those involved making:

the Raspberry Pi
the Rasbian OS (Debian)
linux program alsa-utils
prompt/cli module code A-312
npm module Sonus
npm module Chalk
npm module decimal.js
npm module Weather
npm module play-sound
xmbc plugin code Marcus Linderoth

Hope I didn't miss anyone here, if so, please let me know and I will update!

Apologies:

I am a complete moron when it comes to asynchronous programming, and I am positive that many functions could have been written better/cleaner/more efficient.
I made this project to enhance my understanding of Node.js/Javascript, so please remain calm if/when I don't understand your comment/code/bugfix/pull request/advice/issue at first glance.

Keywords

FAQs

What is @ztiknl/sara?

Is @ztiknl/sara popular?

Is @ztiknl/sara well maintained?

Last updated on 27 Jul 2019

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@ztiknl/sara

ToC:

What is Sara:

Requirements:

NPM modules:

How to use:

Internal commands:

Colors:

Verbose:

Help:

Hearing:

Voice:

Vision:

Regular Expression matches:

Layered commands:

Plugins:

Provided plugins:

Math:

Conversation:

Location:

Weather:

XMBC remote

Audio in/out issues:

Other issues:

Sonus/Google Cloud Speech API

Known:

Todo:

Long term goals:

Credits:

Apologies:

Keywords

Related posts

@ztiknl/sara

ToC:

What is Sara:

Requirements:

NPM modules:

How to use:

Internal commands:

Colors:

Verbose:

Help:

Hearing:

Voice:

Vision:

Regular Expression matches:

Layered commands:

Plugins:

Provided plugins:

Math:

Conversation:

Location:

Weather:

XMBC remote

Audio in/out issues:

Other issues:

Sonus/Google Cloud Speech API

Known:

Todo:

Long term goals:

Credits:

Apologies:

Keywords

Related posts

ESLint Approves RFC to Add Support for TypeScript Config Files

The Alarming NVD Backlog: Over 50% of Known Exploited Vulnerabilities Await Analysis