This page describes a "non-automatic" system for providing remote interpretation and mobility assistance for blind people (or other sense-impaired people). Please note that the Distant Assistance System is currently at the "concept" stage.
There are many technological aids for blind people, for example electronic travel aids (ETAs) and speaking reading machines. However certain information is difficult for automatic devices to interpret : shape and object recognition is only practical for particular items that are presented in a controlled way, e.g. optical character recognition (OCR). Blind people may prefer the flexible help and non-robotic voice that a person can provide, but such assistance is not always available.
The "Distant Assistance System" as envisaged would provide blind people (or other sense-impaired people e.g. deaf people) with assistance from other people on a planned or ad-hoc basis, by matching clients to volunteer assistants, who may be located far from the clients.
The Internet allows images and sound to be sent near-instantaneously between people located around the world. The suggested system would use a dedicated Internet website and standard communication methods. Video and audio data would be "streamed" in "realtime" from the clients to the assistants, who would describe the audiovisuals, give mobility advice etc. The feedback would be returned to the clients near-instantaneously, and in a modality that they can comprehend.
By matching blind clients to sighted assistants, the system could help blind (or deaf) people with identification problems, mobility problems and other vision-related tasks. When "Internet phones" with "mobile video" become popular, the system could "page" assistants, wherever they are located.
The rest of this page describes how the system could operate if implemented.
After both the client and assistant have "signed on" to the system, it would open a channel between them. The information to the assistant would consist of sound and visuals, plus an indication of how it should be interpreted. Feedback from the assistant to the client could be audio (i.e. speech), or text that can be read as braille or presented via a speech-synthesiser. (For deafblind clients, the feedback could be in an entirely tactile format.)
Once a link-up is established, the assistant would interpret the audiovisual images for the client until the task is complete. Typical tasks would include :-
The system could also be used by deaf people e.g. to obtain interpretation assistance with certain sounds.
Figure 1 shows how the system could be implemented.
The equipment that is used by clients and assistants will depend on the tasks : an assistant could, for many tasks, manage with standard Internet browser software running on a PC that is connected to the Internet via a fast modem, with appropriate "plug-in" software installed to allow audiovisuals to be streamed to them from the client. Standard Internet "videophone" or "video conferencing" techniques can be adapted for this purpose. A client would generally need headphones; a video camera; and appropriate video capture hardware and software.
The volunteer assistants could include those who are housebound, and retired people. Office workers, who often have Internet connections, may be able to divert from their normal duties for short periods in order to assist clients. Thorough training and testing would be essential for those who are going to give mobility assistance, a role of considerable responsibility and for which high skill levels are needed.
If the site is run with the help of volunteers then a relatively low income would be required. One source of income would be to provide a paid service to help businesses and institutions satisfy accessibility obligations - for example by providing a translation service for media that are only occasionally accessed by blind people.
The system could provide the following functionality and facilities :-
Giving mobility instructions is a skilled activity that could only be performed by specially trained and tested assistants. The client would need to use lightweight battery-powered "wearable" equipment which could be used in conjunction with other mobility aids, e.g. a guide dog or an obstacle detector. One possible approach is described below.
Figure 2 shows a client with two separate colour video cameras (to allow the assistant to view 3D images); an IR (infra-red) camera for low-light use; a powered pointer/"sidestick"; "wearable" PCs; and communications equipment.
(Ready-made portable data-gathering equipment could also be used e.g. the systems that include headset apparatus containing a video camera; an electronic compass; head-orientation measurement equipment; a GPS satellite (or GSM) positioning unit; and satellite (or GSM) communication equipment.)
The feedback could be straightforward verbal instructions, or the powered sidestick could be moved to give tactile/haptic instructions to indicate direction, "stop", "step-up"/"step-down" etc., so that the ambient sounds are not effected.
The equipment should as far as possible be duplicated, in order to provide backup facilities. Two extra Internet sites, operated by different site hosting providers, could each be used to transfer the images from one of the cameras, so that in the event of one site failing then the other site should still be available.
Special software could be added to the assistant's Internet access software in order to present the images (recorded by the client's cameras) in a convenient layout, as shown in Figure 3.
The cross-hairs are "fixed" and correspond to the direction and orientation of the client's torso or direction of travel. The received images are presented relative to the fixed cross-hairs, and so illustrate the orientation of the client's head with respect to their torso (or direction of travel). Presenting the images in this way reduces the jolting effects caused by the movement of the client's head between frames, and allows successive images to be displayed in their correct position relative to the other images. The assistant could view the two colour images simultaneously via 3D "shutter" glasses in order to see a 3D image in the common square area.
Internet bandwidth is limited, and the image frame rate will depend on the bandwidth, image resolution, and degree of compression. The system should allow the assistant to either set the frame-rate (and have variable resolution), or fix the resolution and accept a variable frame rate. All cameras, and the head-position measurements, should be triggered at the same instant, so that the images can be correctly positioned on the assistant's monitor.As the view will normally be moving about, the assistant should be able to issue a temporary "freeze images" instruction to the client's equipment. The assistant could then inspect parts of a frozen image in more detail, by indicating an area within the image, and then the corresponding part of the full image would be transmitted as the next frame. Such detailed sections could be displayed on the assistant's screen as small high-definition areas within the existing low-definition images.
The monochrome IR image does not need to be constantly transmitted, but can replace one of the colour images for low-light or night-time use, or if one or both of the colour cameras has failed.
If verbal feedback is preferred, then the assistant should use conventions and methods devised by mobility specialists to guide blind people via verbal instructions. To indicate a direction, the assistant could either use standard verbal description or could instruct the system to generate coded tonal stereophonic sounds to indicate direction.
Many blind people make use of ambient sounds and may prefer to only receive tactile/haptic feedback. In such cases the assistant could use a pointer to indicate direction (or a location), the pointer's position being communicated to the client's PC and used to control a powered "sidestick" (see Figure 2 above), which would pull the client's hand and arm to the corresponding position. For example sideways movements could indicate requests to turn in either direction, and by how much; a sudden backwards movement could represent an instruction to stop; combinations of backward and forward movements could indicate trip hazards, steps-up, steps-down etc.
The following three fictional examples illustrate how the website could be used :-