Cristian Merighi ()
Getting hands dirty by challenging gesture tracking and speech recognition in Microsoft Kinect SDK (beta2). Waiting for forthcoming commercial release.
This article is obsolete. Some functionalities might not work anymore. Comments are disabled.
I've just approached - as a developer - the Kinect hardware and its relevant SDK (beta2 version), not timely at all! In fact the new
Kinect for Windows (HW + SW) is going to be released on February 1st.
So maybe, what I've been trying to learn and implement on top of the SDK will be rubbish as soon as the new features will be revealed.
Anyway, it worthed the time spent at least in terms of "getting started and aware" of this interesting device's potential.
What I've been challenging the most were the skeleton data tracking and the speech recognition capabilities.
If on the one side the audio libraries are pretty much comprehensive of what is needed in order to neatly develop speech recognition features, on the other side the
gesture API's are definitely a wild-west: a no-man's land where any despotic developer can build up his own framework of rules, decisions and assumptions... Well, pretty exciting scenario uh?!
The first kind thought of mine in approaching the device was: "I don't want to coerce the user to learn the gestures I decided to implement", and the second:
"I want my natural interface to be smart enough to understand what the user want to do". At least so intuitive that any user can easily tune his actions in order to get the job done.
Ok, what I then argued was that I needed a flexible system of "usual" gestures (think about panning, sweeping, rotating, zooming...) and I tried to impersonate the machine in order
to figure out how to detect those kind of acts.
I've been noticing that a "usual" gesture "usually" reveals itself after a set of prodromic intermediate gestures.
For instance, a <Pan> gesture can be properly detected after a <Palm> gesture (that's how I refer to the act of lifting the hand in a position above its contiguous elbow) already occurred.
I've been trying to prototype a working framework that'd work that way, and this is the resulting sketched class-diagram:
Test application for this embryo of a gesture-system is the most classic "planet-manipulator": the 3D gestured earth.
For the records, the involved gesture tracker
has name Manipulate3DGestureTracker, and depends on a Grope (eheh) or Grab gesture. It can be disposed by dropping both hands below the hips' level or (see the video-sample below) by forcing
the stop via voice command.
Now I wait to see what will happen on February 1st, and how Microsoft will make my work a complete nonsense... ;)
Take care. Bye.