Progress Report: Design and First Implementations

Last week our group came up with first drafts of how the final graphical user interface (GUI) of our multimodal application should look. Also we kicked off a project on github where our group already pushed first backend and database implementations.

The Recipe List

The first that the user of the cooking aid application will see is a list of Recipes that are stored inside the database. The List can be queried using the search box located on the top left of the screen. Inside the textbox, the use can type a recipe name, an ingredient name or the meal type and the list is then being filtered accordingly.

1Recipe Overview

After a Recipe was selected from the recipe list, the application changes to the Recipe Overview view, inside which the user obtains all the information he needs about the selected recipe, including nutritional informations and the difficulty of the cooling steps. By Clicking/Touching the “Start Cooking” or by simple saying “Start cooking” into the microphone the application changes to the Recipe Steps View of the application.

2Recipe Steps

Here the user control the application solely with speech input. By saying “Forward” or “Back” he can navigate through Cooking Steps.

3 Also can receive additional information from the application regarding nutritional information of the ingredients. For example by simply saying out loud “Show me nutritional information about two slices of cheese”, the user gets the asked information display on the screen. This feature will be accomplished with help of the Wolfram Alpha API:


Projekt Idea: Cooking Aid Application



This week our group came up with the wonderful idea of creating a cooking aid application. The main idea behind the application is to have a guide aiding people to cook by providing step by step recipes. An important aspect is that the application can be triggered using voice commands . This feature comes in especially handy when the chef is busy chopping up ingredients or has dirty hands. Chefs can benefit of a hands-free experience!


technology_stackFor the web client we will be using AngularJS, an MVC Javascript Framework to display the interface of the application and for the communication between our backend and the Google Web Speech API. The Web Speech API converts the voice commands into text so that they can be used in the web client. The JSON server takes care of the data management providing the raw data of the recipies.


Comparing the human visual system to a camera.

Comparing the human visual system to a camera is not a new idea, in fact it is an old idea of historic value.The story begins with the camera obscura. Leonardo Da Vinci used the camera obscura as a metaphor for the human eye.

The basic principle of creating an image from a scene is the same in the camera as in the human eye: Light is falling through a small hole into an otherwise completely dark “chamber”. On the back side of the chamber an upside-down image of the scene is being projected and can be viewed. In the human eye, the light is falling in through the cornea, then the pupil, and finally the lens, before the image is projected onto the retina. In the camera light is falling in through the lens and is projected onto photographic film or an electronic image sensor.

The paragraph above explains how an image is being created. But how can we achieve a sharp image, which is not too dark neither too bright? The amount of light coming through the lens needs to be controlled. In the eye, the pupil in the center of the iris is doing this, in the camera it is the shutter. The lens is responsible for breaking the light. Additionally to the lens, the human eye breaks the light with the aqueous humor, and the vitreous humor. The focal length of a camera is always fixed. It can only be changed by moving the lens. In the human eye however, the focal length is adjusted by changing the form of the lens with the ciliary muscle.

One important difference between the camera and the eye is that humans create an image from two input lenses, and construct a mental image from it. Because of this we can see depth.

Humans construct an image with a very high dynamic range by focusing first on one part of the scene, then on another part. Then a mental image combining the two images is being constructed.  HDR photography actually works similar to the human eye again. To construct an HDR photo, two or more images are being combined into one. The first one is being taken with a wide aperture, and the second with a small aperture. The combined image will be able to show a much higher dynamic range than one of the images.

Finally, it cannot be said if the camera is superior to the eye. Both have their strengths and weaknesses. The eye is much better for use in the daily life, being able to quickly accommodate to different light sources and focusing moving objects. Cameras can be especially superior to the human eye in certain fields, like long exposure images, focusing objects which are too far away to see for the human eye, and of course capturing moments which might go unnoticed because they happen too quickly to be processed by humans.

Words: 487