Monday, March 12, 2007

Speech Recognition and Phenomes

I sat down thismorning and did some basic research on speech recognition, speech analysis and the study of phenomes. If we're going to be analysing the speech patterns / words of people, then we'll need to have a good understanding of all of these. We've got studio5 today, so I will discuss this with my group today. As it appears we may have a new group member, we have to discuss ideas with him too - so we can all be on the same level and understand what each member would like to do. Research follows.

Speech Recognition : The process of converting a speech signal ( Audio ) to a sequence of words.

  • Most Systems today use the “Hidden Markov Model”

    • Statistical model in which the system being modeled is assumed to be a “markov process” with unknown parameters, and the challenge is to determine the hidden paramaters from the observable parameters.”
    • the “state” is not direcly visable, but variables influenced by the state are visible. A Sequence of tokens generated by a Hidden Markov Model gives information about the sequence of states.
  • Also can use “Artificial Neural Network”

  • Modern day systems us “Noisy Channel Formulation”

    • The task of the recognition system is to search for the most likely word sequence given the acoustic signal. Ie, System is searching for the most likely word sequence among all possible word sequences

( W~ = Most likely word sequence, all possible word sequences = w*, the acoustic signal = A ).

W~ = arg maxWeW* PR(W|A)

  • Audio Visual Speech Recognition

    • Technique that uses “image processing” technology to Lip read, to aid in speech recognition.
    • Each system ( lip reading / speech recognition ) work separately, then the results are mixed at a later stage
  • Speech Synthesis

    • Artificial production of human speech.
      • Text to Speech
      • Symbolic Linguistic Representations ie
        • Phonetic Transcriptions
  • Voice Analysis

    • The study of speech sounds for purposes other than linguistic content.
      • Ie, analysis of vocal quality of medical patients
      • speech therapy
  • Phonetics

    • Sound is a series of pressure changes in the medium between the sound source, and the listener
    • Oscillogram / waveform = most common representation, pressure increases / decreses the signale.
    • Pitch Analysis = Another representation of a speech signal
      • Speech is a physical process consisting of two parts
        • Product of a sound source ( vocal chords )
        • filtering ( tongue, lips, teeth )
      • Pitch Analysis tries to capture the “Fundamental Frequency of the sound source” by “analysing the final speech utterance”.
        • Fundamental Frequency – Dominating Frequency of the sound procuded by the vocal chords.
        • Difficult to perform.
    • Spectrum
      • Specrtum gives a picture of the distribution of frequency and amplitude at a moment in time.
      • 3d graph required to plot time – ie , spectrogram
    • Spectrogram
      • Time = horizontal axis, frequency = vertical axis. Amplitude ( 3rd axis ) represented by shades of darkness.
      • Voiced sounds appear more organised.


Some Links

Speech Analysis Tutorial”

http://www.ling.lu.se/research/speechtutorial/tutorial.html


The CMU Sphinx Group, Open Source Speech Recognition Engines” - http://cmusphinx.sourceforge.net/html/cmusphinx.php


Praat : Phonetics by Computer”

http://www.fon.hum.uva.nl/praat/


Speech Analyzer”

http://www.sil.org/computing/speechtools/speechanalyzer.htm


lingWAVES : Signal Analysis”

http://www.lingcom.de/english/products/lingWAVES/lingWAVES_overview.htm



Saturday, March 10, 2007

Group Selection

Well, the groups have been chosen.
I'll say once again that I absolutely abhor group work - but I can understand *why* we need to be able to work as part of a team , especially as we're going to be working with groups / teams out in the field. It is just in my experience that there's never enough people, or experienced people, or enthusiasm in a group in order to make it actually relevent to be in one. Oh well.

I'm working with Megan and Pierre. I've worked with both Pierre and Megan before, so I know what to expect from both of them.

The lecturers split us into groups based on the general themes they picked up in our ideas ... apparently. It's just unfortunate that they picked my least favourite of my 5 ideas to work from ( Chatter Critic ). To be honest! I actually would have thought that it would be vitoed because of ethcial issues ( Monitoring peoples conversations ? Hello Big Brother! ) - However, it's what Ive been given, and I've been given worse assignments in the past. I will just have to make it as challenging as possible.

I'll be posting all my research into the assignment here, as well as in our group blog which is located here : http://silentradios.blogspot.com/

And I think that's a long enough disgruntled post for tonight.

- Anthony

Sunday, March 4, 2007

5 Design Ideas

Another bit of assessment we had to hand in today was a selection of 5 ideas. We had to submit as a pdf file, but here are my 5 :

Personalised Insult Machine ( Anthony Massingham ) -

A digital ( or physical ) reprentation of a machine or monster ( ambiguous to its sex / species / lineage ) analyses users who walk past it and delivers a personal comments towards each participant. Hidden video camera ( or web cam ) records live footage of users passing by the installation and passes the data on to the computer. The computer then analyses various bits of data aquired in the image, such as hair colour, clothing colour, height and sex – and builds a unique vocal comment based around that information. The machine then generates this statement and speaks it ( through a hidden speaker ) to the people walking past it. Depending on its “mood” it may compliment, or insult the users.


Deja Vu ( Anthony Massingham ) -

A digital portrait that steals its facial qualities from the people who examine it. Cameras take still images of users faces and apply elements of that to the facial mesh that is displayed on the portrait. Each face contributes a small element, whether it be facial shape, nose shape, hair colour or even lip size. The displayed face is a combination of a large number of different faces to produce a completely unique person. The face continues to evolve as more people add their own likeness to the mix.


Gallery Spy ( Anthony Massingham ) -

Gallery Spy observes the movement of gallery patrons around the space and the duration they spend at each location. A camera tracks the movement of each user around the space locking on to each person indivdually. The movement data is graphically represented as an abstract version of the gallery floorspace, with each person represented by a various colour. The thickness / boldness of the persons colour depends on the duration of time spent in any one location. The visual data itself is displayed back on a screen which is surrounded by a picture frame. The Gallery spy shows people branching off from work to another in a vine / tree like fashion.


Chatter Critic ( Anthony Massingham ) -

An ambient and background conversation monitoring tool. Microphones are placed throughout the gallery at various artworks. Each microphone monitors the conversations that people have about each painting / exhibit. Each conversation is analysed for positive and negative comments. These words / phrases are then sent to the computer, and are saved, edited, cut apart and re-arranged into new phrases and comments and played back through speakers as well as being displayed on screens. Depending on the frequency of use of a particular word of phrase it will determine the amount of times it is repeated or displayed.


Sound Canvas ( Anthony Massingham ) -

Experimentation into the visual aspects of sound. Microphones are placed at various points of the installation. Each monitoring high, mid, and low frequency sounds ( both ambient and focussed ). These sounds are then translated into visual data and displayed on a digital canvas. Users are encouraged to make various noises and see how they appear on the canvas. The stereo field determines the location of the visual data ( Sounds to the left appear on the left side of the canvas ) and volume determines the colour and strength of the digital paint. The type of sounds ( and sound waves ) used also have an effect on the style of the painting.

Limitations - IAG

The most obvious limitation of the Ipswich Art Gallery is the amount of space available for students. It's almost impossible to have anything of any great size in the small corridor we've been assigned. Although there are two projectors mounted on the roof, There's only really one small projection environment, limiting the number of projects that can use projection as an option. The floorspace is adequate enough to create various floor sensitive devices, but at the same time it will not be easy to fit all that is required into the space.

Limitations - Multi-Touch Interface

There are many limitaitons behind this system. Finding a surface with enough sensitivity to monitor multiple points of contact would be essential to ensuring that this project worked at all. As this is still in the research phase - there's still many more limitations to discover, such as the number of touch registration points at any one time, or the differences in pressure sensitivity and the effects on the screen.

Limitations - Khronos Projector

This project is incredibly well documented. There are backup plans for almost every scenario. It is almost impossible to find a single flaw in its construction. If you dont have the available materials, it offers possible solutions. One limitation that is mentioned throughout this list, is the use of the tactile fabric. Without it, the feeling of "touch" is lost ( ie, if using a touchscreen or graphics tablet or tablet PC ). You dont get the same feel for the artwork and you dont get the same levels of sensitivity.

Limitations - Electric Moons

Space is another big issue in this installation. And it seems like it will be for most of the projects, especially since the space in which these projects will be presented is TINY. This would take up the entire space, and would lose its entire effect.

Limitations - SoniColumn

The biggest limitation ( for me personally ) to construct something like this is I have absolutely no experience in engineering. None whatsoever. And it is obvious from the excellent construction work that that is what's required to build something of this magnitude.

Limitations - External Measures

The obvious limitation of this system is people. Without an audience, there is no artwork. However with no audience, there's nothing to see. So that limitation is silly. Another thing I noticed from the videos, is the speed. While it's talking about kinetic drawing and fluid motion - the motion seen on the screen is actually anything BUT smooth.

Computer and processing speed is an obvious limitation.

Limitations - Interactive Waterfall

Technology and the price of technology is the limitation in this project. In fact, in the original version the designer wanted to use an array of plasma tvs / screens. However this proved too expensive, and had to opt for the lower resolution LCD "Pixels". However out of this limitation he came up with an interesting abstract effect that seemed to justify his original design intent.

Limitations - Audiopad

The limitations of Audiopad mainly consist of the storage device and collection of samples in the program. The installation also requires the use of the control 'stone' type objects. Limiting the number of people that can interact with the system at any one time.

Limitations - L.A.S.E.R. Tag

I could say space again, as it is entirely obvious that space would be a huge issue for such a project. Considering it requires an entire building to use as a projection space. The concept could easially be scaled down. Although it wouldn't be as impressive, and most of the purpose would be defeated, it would still work. So another limitation of this project are the obvious legal ramifications ( in fact I think the original team were arrested, charged, or at least questioned over this original projection bombing run ).

Limitations - Shadow Monsters

The limitations of shadow monsters are mainly space issues. There has to be a decent enough space to provide not only the projection space and projection unit, but the area for the shadows too.

[ Related Project ] : Squeezebox


Project : Squeezebox
by Iain Mott, Marc Raszewski and artist Tim Barrass
Squeezebox incorporates spatial sound, computer graphics and kinetic sculpture.

Participants manipulate the sculpture to produce real-time changes to the spatial location and timbre of the sound, as well as to manipulate digitised images. The sound and images are presented as an integrated plastic object, a form which is squeezed and moulded by participants. The artwork consists of a frame supporting four sculpted pistons on pneumatic shafts. An interactive image is displayed on a monitor beneath a one-way mirror at the centre of the sculpture. Four loudspeakers are situated at the outer four corners."

This is a form of interactive sculpture with a computer backing. By moving bits of the sculpture, users can manipulate not only sound, but digitised images.

Project Link : Squeezebox

[ Related Project ] : Multi-Touch Interaction


Project : Multi-Touch Interaction
by Jeff Han
Bi-manual, multi-point, and multi-user interactions on a graphical display surface.

More research than a project, but It has been presented in a few locations, and is looking at the breakdown of modern day "interfaces" and touch screens. Removing the technical computerish aspects of interaction, and replacing it with intelligent gesture based interfaces. Using multiple points of contact ( See Minority Report ;) ) to interact with the data on the screen.


Project Link : Multi-Touch Interaction Research

[ Related Project ] : Khronos Projector


Project : Khronos Projector
by Alvaro Cassinelli
a video time-warping machine with a tangible deformable screen

The Khronos projector aims to give users a new way to look at video by offering a tangable screen. Users are able to physically interact with the screen and not only modify the linear time of the film, but is able to send particular parts of the image forwards, or backwards through time. "Waves" of time can be seen if the user shakes the screen. The installation uses a video projector, a large tissue-based deformable projection screen, and a sensing mechanism capable of acquiring in real time the deformationof this tissue


Project Link : Khronos Projector

[ Related Project ] : Electric Moons


Project : Electric Moons
by Christopher Bauder
A hundred white balloons in a totally dark room are floating in space like the atoms of a molecule.

"The interactive ballon ballet is built out of synchronized movement and lighting. A screen based interface telecommands the balloon ballet in sync to a chosen musical piece. The user can control the movement and lighting of each balloon independently. Morphing 3D shapes and patterns are blended with an overlay of supporting or counteracting light animations."

A very interesting exhibit. Large floating spheres fill the space and move up and down in time with lighting and in time with each other.

Project Link : Electric Moons

[ Related Project ] : SoniColumn


Project : SoniColumn
by Jin-Yo Mok
"The lucid sounds coming from a simple play mechanism stirred me up with my old memory echoed with them"

SoniCoumn is an interactive sound installation that can be played by a person’s touch. In a sense, its a glorified version of an old fashioned music box - but it's fully customisable. The user selects what notes they want to play by turning on LEDs. Then they turn a crank, which turns the cylinder and plays the sequence of notes selected by the user when they pressed the LEDs.

Project Link : SoniColumn

[ Related Project ] : External Measures


Project : External Measures
by Camille Utterback
External Measures is a series of interactive installations that explore the possibilities of projected "kinetic sculptures" or "living paintings".

This series of interactive artworks focuses on the evolution of a painting that is being manipulated and drawn using the movement of participants. The program analyses the position of people as they move around a space and draws the painting based on movement. The painting is composed of algorithmic generated textures and patterns that are then displayed depending on the movement of the people around the space.

Project Link : External Measures 1, External Measures 3, External Measures 5

[ Related Project ] : Interactive Waterfall


Project : Interactive Waterfall
by Charles Forman
"I think that space should be functional and it should serve an effective purpose."

The Interactive Waterfall is an array of "pixels" that move and ripple in response to proximity and movement. Creating an ambient aesthetic mood. Using an overhead infra red camera to analyse the movement of passers-by. It serves as both an interactive display, and a peice of art.

Project Link : Interactive Waterfall

[ Related Project ] : Audiopad


Project : Audiopad
by Localfields
Audiopad is a composition and performance instrument for electronic music which tracks the positions of objects on a tabletop surface and converts their motion into music

Audiopad is a table-top music composition and mixing device. Users move sensors around on the table to mix different sounds and effects, changing volume, pan, and levels. Audiopad not only allows for spontaneous reinterpretation of musical compositions, but also creates a visual and tactile dialogue between itself, the performer, and the audience.


Project Link : Audiopad

[ Related Project ] : L.A.S.E.R. Tag

Project : L.A.S.E.R. Tag
by The Graffiti Research Lab
Laser Pointed Projected Graffiti on Buildings

Interesting project by the GRL. Using a powerful Projector, an even more Powerful Laser Pointer, and a computer - the designers have created building-size graffiti. Users "draw" using the laser pointer. The computer picks up the pointers location, and projects the now drawn material onto the side of the building.

Project Link : L.A.S.E.R. Tag

Friday, March 2, 2007

[ Related Project ] : Shadow Monsters


Project : Shadow Monsters
by Philip Worthington
Monsters appear from the shadows cast by the hands of participants.

This is a project I saw a while ago and I was quite interested in. It's a play on hand shadows and the creatures and forms you can make by projecting your hands shadow onto a wall. The program ( coded in processing ) examines the movement and placement of the shadows, and not only adds teeth, hair, spikes, and other monster-ish appendages, but sounds. A great little interactive exhibit, fun for all ages.

Project Link : Shadow Monsters

COMP3000 Introduction

Evening.
This is the blog for my Studio 5 subject ( or COMP3000 as the uni would like you to know it ). I'll be posting various things in here over the semester. This is basically just an introductionary post. This subject, the 5th iteration in the series of "studio" subjects that we've had at uni is focusing on "Ubiquitous computing and the dissapearing computer in interactive art". Interesting stuff! I welcome comments from both people at the university as well as people external to the uni.

The first few posts here are in response to our first "assignment" I guess. We've got to Identify 10 existing projects / nstallations which are relevent to the ubiquitious computing theme. We've then also, got to identify 10 constraints for such projects. Wether that be work-specific, or a more general constraint due to the space we've been presented with ( which is actually the Ipwsich Art Gallery in QLD, Australia ). Each of these will be in a separate post. So enjoy lots of random posts!

I'll endeavour to keep this blog updating over the semester. If you're interested in seeing another studio subject blog, then I've also got my Studio 4 blog ( well, my Team's studio4 blog ) linked to this account, its at : http://theweatherroom.blogspot.com/ . The weather room is an interactive weather simulation using macromedia flash and motion capture.

And that's it for now. Hope this blog all makes sense. It's really to be used as a means to document the process behind the project, as well as a guide for fellow students and my lecturers, but as I stated earlier - any visitors are appreciated.

- Anthony