2004 Short Talks
There are approximately 10 million blind or visually impaired people in North America; around 109,000 blind people in the United States use canes to get around, and another 7,000 use seeing-eye dogs. The need for assistive vision technologies is clear.
Over the past several years, statistical methods have dramatically improved the accuracy and speed of computer vision algorithms; for example, it is now possible to reliably detect faces in a video signal at 15 frames per second.
We propose to develop a system which will make use of current computer vision technology to aid blind or visually impaired pedestrians. We will produce a real-time, portable system that will notify blind people of salient features in their visual field (including faces, text, walk/don't walk signs).
Maintaining Context while Working
Users have a hard time finding their way around large information spaces. This is particularly true of the software development domain. Therefore, we are building a tool designed to help developers interactively explore source code by incrementally building and refining a visual representation. This tool, VENIS (a Visual Environment for Navigating and Interacting with Software), focuses on reducing the high cognitive overhead of current software visualization tools.
Venis does this by maintaining developer context and providing explicit support for bottom-up exploration. It displays method bodies in text editors, and displays other code elements (fields, classes, namespaces, etc.) in a manner similar to that of UML class-diagrams.† Relations between code elements are shown intuitively; inheritance relations are indicated vertically, method calls horizontally, and containment by nesting.
Natural Language Information Access
START has been answering English questions about a restricted set of knowledge on the web since 1993.† Unlike google or teoma, we rely on language structure, not just keywords, and provide exact answers, not just pages and snippets.
We started by answering questions exactly based on semistructured web knowledge sources.† Recently we have started looking into extending the covered knowledge in a number of ways:
We are looking for new students interested in language...† Ask us!
Programming an Amorphous Medium
Amorphous computing is the study of how to program robust behaviors on a network composed of vast numbers of unreliable devices, scattered in space and communicating only with nearby neighbors. Potential application areas such as smart fabrics, large-scale sensor networks, and biological computing, suggest a scale from thousands to trillions of nodes. Building on previously developed computational primitives for amorphous computing, I am developing a high-level language for programming Euclidean surfaces (for which the network is a discrete approximation) thereby making message passing entirely implicit. The language allows spatial containment of process execution, and promotes robustness by declaring actions in terms of invariants and repair.
Fighting Phishing with Semantics-Integrated Systems
Phishing becomes the biggest threat to peopleís Internet life. It uses fraudulent emails or web pages, which look like email or web pages from legitimate organizations, to deceive users into disclosing their personal and financial information by either directly submitting sensitive information via web forms or downloading hostile codes that can search userís computer and monitor userís online activities for sensitive information. Phishing is a type of semantic attack, which targets on how humans assign semantic meanings to human-computer interactions. In order to solve the phishing problem, I believe that systems should bridge the gap between human userís mental mode about their online activities and the system model about how to implement such activities.
There are five fundamental properties of online messages:
Presentation of property 2 and 5 (system model) are important for users to assign correct semantic meanings to their online activities. System derivation of property 1, 3, and 4 (mental model) are important to detect possible fraudulence with the message. To present to users the system model and the warnings based on the derived mental model, I propose to use the content-integrated display method, which is expected to avoid both the inattentional blindness and the banner blindness.
Structure Learning in Machine Translation
NLP has undergone several revolutions in its short history. Although initial efforts in parsing, machine translation, and search were based on the dry, rule-based methods of white-haired linguists with pipes, today's state-of-the-art is completely stochastic-- based entirely on statistics tricks and large training-sets. This is extremely dissatisfying. There is a silent, concerted effort by the NLP community now to incorporate *some* linguistic information to push the state-of-the-art forward. At the same time, current cutting-edge techniques are not natural or beautiful (let alone simple), as one might expect a purely mathematical construction to be and as former cutting-edge techniques have been. In an effort to relieve the community of both of these problems, we are beginning to apply structure learning techniques to machine translation. We also hope that this approach does better than anything else.
Email Reliability: Measuring Todayís Preeminent
Rob Beverly and Mike Afergan
Many senor and disruption tolerant networks rely on store-and-forward communication as a lower-level primitive.† The Internet SMTP email architecture and infrastructure is the preeminent example of a large store-and-forward network.† While the general perception of email is it "just works," anecdotal evidence suggests otherwise.† As part of ongoing research, we are measuring the reliability of email.† By better understand Internet protocols which lack explicit end-to-end connection semantics, we hope to derive guidelines for designing future networks. Our testing methodology provides us with an email "traceroute" mechanism we use to investigate email loss and latency over several weeks.† Across a large and diverse set of domains including random, popular and Fortune 500, we find surprising, often degenerate, results.† In this Oxygen talk we motivate the problem driving our research, present our methodology and speak to initial results.
Probabilistic Geometric Grammars for Object Recognition
Most objects we encounter daily have a strong hierarchical and substitutive structure -- rooms consist of furniture, while chairs have seats, backs, and legs. Inspired by natural language processing, our approach to the task of object recognition uses a grammar representation for classes of 3D objects, to exploit this structure. The probabilistic context-free grammar (PCFG) framework allows us to specify the number and types of pieces that make up an object, and also represent distributions over various types of objects in a class. We extend PCFGs to capture the geometric relationships between parts of the object, using multivariate Gaussians over the dimensions and positions of the parts. In this framework, recognition of an object instance parallels the parsing of a natural language sentence. Currently, we concentrate on finding the object class given the shape, rather than finding the object shape given the appearance, which often characterizes modern computer vision.
Dynamic Processor Allocation for Adaptively Parallel Jobs
We are investigating the issues of resource allocation to jobs running on the same system. We specifically focus on processor allocation for adaptively parallel jobs running on shared-memory multiprocessors. Adaptively parallel jobs are jobs whose parallelism changes during execution.
When multiple parallel jobs are running on a system, a dynamic processor allocator should change the number of processors allotted to the jobs during execution to match the requirement of each job as closely as possible. A dynamic processor allocator must estimate the parallelism of each job, and then, if needed, change the allotments accordingly. We are trying to design algorithms for both these steps. We have implemented some of our ideas in the run time system of the Cilk multi threaded language. Empirical results show that our system out performs static allocation in most cases.
Consistent Hypotheses Test
Our research area is Simultaneous Localization and Mapping
(SLAM). Recently, we have been working on a "Consistent Hypotheses
Test"-- a method for identifying sets of hypotheses which are consistent
with each other. The algorithm is formulated as a graph
Identifying sets of consistent hypotheses is useful in a number of applications. At IEEE Autonomous Underwater Vehicles (AUV) 2004, we showed how to identify outliers in noisy range data obtained from navigation beacons. This is a challenging problem due to interference from the vehicle's sonar payload and other noise sources. Our resulting filter was able to compute the locations of the beacons and vehicle without prior beacon location estimates or inter-beacon measurements.
Another application of the Consistent Hypotheses Test is data association-- determining which sensor observations correspond to the same environment features. Data association is critical in SLAM problems in order to "close the loop". The Consistent Hypotheses Test computes which data association hypotheses are consistent† and which are ambiguous or erroneous.
Please drop by to chat, or visit our website for recent publications and information: http://cgr.csail.mit.edu
Software Transactional Memory Using Memory Mapping
One of the difficulties when writing parallel programs is ensuring concurrent accesses to shared data are done correctly. Programmers typically use locks to guarantee that critical sections of code appear to execute atomically. An alternative to locking, however, is programming using transactions. A programmer only needs to specify a section of code as a transaction. An underlying transactional memory system ensures that every transaction was either executed atomically, or was aborted and made no changes to memory.
Although transactional memory was originally proposed as a hardware scheme, I am implementing a C library for a software transactional memory system. Programmers can specify code executing on a process as a transaction by enclosing it between two function calls, xbegin() and xend().
The system uses the memory-mapping and page access control mechanisms of the operating system to detect conflicts between transactions at a page-level granularity.
Light Field Appearance Manifolds
We present a new technique that learns the appearance of an object class from example images. Provided a single image of an object outside the database of examples our algorithm can reconstruct the object from novel viewing angles. To faithfully represent the 3D appearance of an object class we learn a model constructed over light fields of objects. A light field is a structured collection of images of a scene or object captured over a range of viewing angles. Through the use of light field rendering, we can re-synthesize each example object from novel viewing angles. In our experiments, we built a light field appearance model of the human head constructed using 50 subjects. From a frontal or side view of the face, our model can reconstruct the subject from many unseen viewing angles. Our algorithm has applications in 3D computer animation, object recognition, tracking, and segmentation.
Statistical Analysis and Transfer of Pictorial Styles
Humans read content from images.† However, images have not only content, but also a notion of "style."† Some pictorial styles differ noticeably and can be distinguished without careful inspection of the picture.† This paper analyzes the style that can be noticeable at a pre-attentive level, using image statistics.
Since the way images are decoded in the human visual system is related to the structure of visual data in the images, we examine the statistical properties of images.† It has been found that visual data in natural images is highly structured and statistical tools have been frequently used to parameterize the structure.† We extend the range of statistical analysis to parameterize and transfer simple notions of pictorial style and ambiance in images.
Furthermore, it has been shown that images are processed in the human visual system using oriented linear filters.† Therefore, this paper will exploit steerable pyramids, which have multiscale and oriented subbands resembling simple cell receptive fields.† We show that the average of the absolute steerable coefficients, as a function of the scale, characterizes simple notions of composition and style.