- Overview
- Publications
- Current Projects List
- Sample Research Projects
- Consortia/Joint Programs
- Research Groups
Affective Computing
Biomechatronics
Camera Culture
Changing Places
Cognitive Machines
Computing Culture
Design Ecology
Ecology Media
eRationality
Fluid Interfaces
High-Low Tech
Human Dynamics
Information Ecology
Lifelong Kindergarten
Molecular Machines
Music, Mind and Machine
New Media Medicine
Object-Based Media
Opera of the Future
Personal Robots
Responsive Environments
Smart Cities
Sociable Media
Society of Mind
Software Agents
Speech + Mobility
Synthetic Neurobiology
Tangible Media
Viral Communications
Research Group Projects and Descriptions
|
Cognitive Machines
Principal Investigator: Deb Roy The goal of the Cognitive Machines group is to create systems that engage in fluid, situated, meaningful communication with human partners. We seek to understand and model the processes by which words are grounded in the physical world as a result of embodied perception, action, and learning. These models are applied to create situated human-machine interfaces. We also use our computational models as a source of predictions and possible accounts for a number of cognitive phenomena including aspects of children's language acquisition, concept formation, and attention. |
|
| Beacon Concept Store with the Center for Future Banking |
William J. Mitchell, Deb Roy, Ryan C.C. Chin, Chih-Chao Chuang, Michael Chia-Liang Lin, Dimitris Papanikolaou, Rony Kubat and Duks Koschitz
The Smart Cities and Cognitive Machines groups have teamed up with the Center for Future Banking to design a concept banking store in the Boston/Cambridge area. This will be a fully functional banking center that simultaneously serves as a living laboratory—a place where new technologies and interior configurations can quickly be installed, electronically monitored (unobtrusively, and with due concern for privacy) to evaluate their effectiveness in use under demanding real-world conditions, and iteratively redesigned in response to this feedback. Utilizing the Media Lab’s extensive expertise in sensing, data collection, management and analysis of large-scale datasets, and data visualization, we will be able to create an adaptive environment that embodies a robotic cognitive architecture capable of intelligently responding to the occupants and visitors to the building. Architecturally, the flagship should vividly represent commitments to effective engagement with the community that it serves, sustainability, and forward-looking innovation. |
| Behavior Capture from Thousands of People Online |
Jeff Orkin and Deb Roy
The Restaurant Game is a multiplayer simulation that captures the behavior and language of thousands of people playing the roles of waitresses and customers. We are developing machine-learning algorithms that mine game-play logs to acquire generative models of human language, behavior, and social roles. These models will power synthetic conversational characters that interact with humans in training simulations, games, and other virtual worlds. |
| BlitzScribe: Speech Analysis for the Human Speechome Project |
Brandon Roy and Deb Roy
BlitzScribe is a new approach to speech transcription driven by the demands of today's massive multimedia corpora. High-quality annotations are essential for indexing and analyzing many multimedia datasets; in particular, our study of language development for the Human Speechome Project depends on speech transcripts. Unfortunately, automatic speech transcription is inadequate for many natural speech recordings, and traditional approaches to manual transcription are extremely labor intensive and expensive. BlitzScribe uses a semi-automatic approach, combining human and machine effort to dramatically improve transcription speed. Automatic methods identify and segment speech in dense, multitrack audio recordings, allowing us to build streamlined user interfaces maximizing human productivity. The first version of BlitzScribe is already about 4-6 times faster than existing systems. We are exploring user-interface design, machine-learning and pattern-recognition techniques to build a human-machine collaborative system that will make massive transcription tasks feasible and affordable.
|
| Collective Discovery |
Frank Moss, Deb Roy, Ian Eslick and Charles Tam
The choices we make about diet, environment, medications, or alternative therapies constitute a massive collection of "everyday experiments." These data are largely unrecorded and underutilized by the traditional research establishment. Collective Discovery aims to leverage the intuition and insight of patient communities to capture and mine information about everyday experiences. Moving the community discourse from anecdotes to data will lead to better decision-making, stronger self-advocacy, identification of novel therapies, and inspiration of better hypotheses in traditional research, accelerating the search for new drugs and treatments. The unique characteristic of our Collective Discovery model is the use of knowledge representation and natural language processing to mediate communal hypothesis generation and to compensate for methodological errors and self-reporting bias. This model is being deployed in a real-world context as part of a partnership with the LAM Treatment Alliance and the greater LAM community. |
| Concrete Financial Sim |
Sheng-Ying (Aithne) Pao and Deb Roy
Concrete Financial Sim aims to anticipate probable outcomes of different decisions across time. Life consistently presents choices that require a rational balance between instant gratification and long-term consequences. Should I buy the sunglasses now or should I save? Should I buy a house, or should I rent a room? What if I do it next year instead of next month? Intertemporal components of choices complicate the decision-making process. The complexity comes not in just a one-to-one immediate tradeoff decision, but in its long-term implications. Based on one’s past financial behavior and current plans, we are designing a decision environment that visualizes the future values of present choices. The goal is to create a reality-based model that informs decision makers of their probable rewards and penalties over time, and will serve as a “cognitive prosthesis” for people to externalize their mental model of intertemporal choices.
|
| Data-Driven Architectural Design |
Rony Kubat, Kenneth Jackowitz (BOA) and Deb Roy
Dense longitudinal video recording of architectural spaces presents new opportunities for design analysis, exploration, and optimization. As part of the Speechome Video for Retail Analysis project, high-resolution video cameras are being deployed in a retail banking environment. From the months of data that will be collected, a variety of performance metrics will be extracted (for example: queueing time, customer confusion, customer/employee social interaction). Beyond the analysis of current building performance, agent-based models of human behavior—trained on the collected raw data—can be used to evaluate potential changes to the space, or to evaluate unbuilt environments. Finally, this agent-based model can be used as a fitness function to evolve procedurally generated buildings to maximize performance across the extracted metrics.
|
| HeadLock: Video Analysis for the Human Speechome Project |
Philip DeCamp and Deb Roy
HeadLock is a semi-automated system for head pose annotation that explores how human-computer interfaces can be combined with computer vision technologies to efficiently extract behavioral information from video recordings. For images with limited resolution, the orientation of a head is often the best approximation for gaze direction, a crucial component to analyzing the rich interactions and behaviors of humans. The goal of HeadLock is to reduce the cost of extracting head pose from video by several orders of magnitude by developing machine-perception technologies that can perform robust head pose estimation with minimal constraints on resolution and camera angle.
|
| Human Speechome Project |
Deb Roy, Philip DeCamp, Brandon Roy, Jethran Guinness, Rony Kubat and Stefanie Tellex
The Human Speechome Project is an effort to observe and computationally model the longitudinal language development of a single child at an unprecedented scale. To achieve this, we are recording, storing, visualizing, and analyzing communication and behavior patterns in over 400,000 hours of home video and speech recordings. The tools that are being developed for mining and learning from thousands of terabytes of multimedia data offer the potential for breaking open new business opportunities for a broad range of industries—from security to Internet commerce.
|
| Internomics |
Ed Boyden, Dan Ariely, Deb Roy, Nathan Greenslit, Sheng-Ying (Aithne) Pao, Coco Krumme, Deborah Egloff, Marko Popovic and James Barabas
How do high-level cognitive functions emerge from primitive neural computations, to mediate complex human behavior? We are developing precise, focal ways of investigating phenomena such as trust and risk-taking, in order to understand how they play roles in purchasing, decision-making, social interaction, and other real-world scenarios.
|
| Spatial Language Semantics for Video Information Retrieval |
Stefanie Tellex, Kleovoulos Tsourides, Gregory Marton and Deb Roy
Natural language queries are an intuitive, powerful, and expressive way to access video recordings for numerous applications. We are building a system that grounds the meaning of spatial prepositions in geometric features in order to search video for clips that match spatial language queries such as "along the hallway"' and "across the kitchen." Spatial language video retrieval is an important real-world problem that is also a natural test bed for evaluating semantic structures for natural language descriptions of motion on naturalistic data.
|
| Speech Interaction Analysis for the Human Speechome Project |
Deb Roy, Brandon Roy and Michael Frank
The Speechome Corpus is the largest corpus of a single child learning language in a naturalistic setting. We have now transcribed significant amounts of the speech to support new kinds of language analysis. We are currently focusing on the child's lexical development, pinpointing "word births" and relating them to caregiver language use. Our initial results show child vocabulary growth at an unprecedented temporal resolution, as well as a detailed picture of other measures of linguistic development. The results suggest individual caregivers "tune" their spoken interactions to the child's linguistic ability with far more precision than expected, helping to scaffold language development. To perform these analyses, new tools have been developed for interactive data annotation and exploration.
|
| Speechome Recorder for the Study of Child Development Disorders |
Sophia Yuditskaya, Kleovoulos Tsourides, Philip DeCamp, Brandon Roy, Matthew Goodwin and Deb Roy
The collection and analysis of dense, longitudinal observational data of child behavior in natural, ecologically valid, non-laboratory settings holds significant benefits for advancing the understanding of autism and other developmental disorders. We have developed the Speechome Recorder—a portable version of the audio/video recording technology originally developed as an embedded sensor network for the Human Speechome Project—to facilitate swift, cost-effective deployment in special-needs clinics and homes in order to capture recordings of child/caretaker interaction and other behavior occurring in daily life. These data will enable us to study developmental trajectories of children with autism from infancy through early childhood, as well as atypical dynamics of social interaction as they evolve on a day-to-day basis. Its portability makes possible potentially large-scale comparative study of developmental milestones in both neurotypical and autistic children. We hope that the data-analysis tools developed in the course of studying behavioral patterns in the collected data will reveal new insights toward early detection, provide a more accurate assessment of context-specific behaviors to enable appropriate individualized treatment, and shed light on the enduring mysteries of autism.
|
| Speechome Video for Retail Analysis |
Kleovoulos Tsourides, Sophia Yuditskaya, Philip DeCamp, Kenneth Jackowitz (BOA) and Deb Roy
We are adapting the video data collection and analysis technology derived from the Human Speechome Project in the retail sector through real-world deployments. We will develop strategies and tools for the analysis of dense, longitudinal video data to study behavior of and interaction between customers and employees in commercial retail settings. One key question in our study is how the architecture of a retail space affects customer activity and satisfaction, and what parameters in the design of a space are operant in this causal relationship.
|
| TrackMarks: Semi-Automatic Video Annotation |
Philip DeCamp and Deb Roy
This project attempts to address the practical problems involved with extracting behavioral information from large, multi-camera video corpora. Ultra-dense video recordings offer new possibilities for in-depth, quantitative analysis of human behavior, with applications ranging from child development research to determining how people are affected by different retail environments. Despite the growing sophistication of computer vision systems being developed for person tracking, gesture recognition, and object identification, these technologies remain error prone. Accurate video annotation still requires substantial human input. In order to analyze the hundreds of thousands of hours of video collected for the Human Speechome Project, we have developed a new software system for semi-automatically annotating longitudinal, multi-track video data. This system combines computer vision algorithms with a novel interface design to enable human annotators to generate and edit video annotations with speed and accuracy.
|
|
MIT Media Laboratory Home Page | Research Main Index |
|
about