- Open sourcing the Embedding Projector: a tool for visualizing high dimensional data
Posted by Daniel Smilkov and the Big Picture group
Recent advances in Machine Learning (ML) have shown impressive results, with applications ranging from image recognition, language translation, medical diagnosis and more. With the widespread adoption of ML systems, it is increasingly important for research scientists to be able to explore how the data is being interpreted by the models. However, one of the main challenges in exploring this data is that it often has hundreds or even thousands of dimensions, requiring special tools to investigate the space.
To enable a more intuitive exploration process, we are open-sourcing the Embedding Projector, a web application for interactive visualization and analysis of high-dimensional data recently shown as an A.I. Experiment, as part of TensorFlow. We are also releasing a standalone version at projector.tensorflow.org, where users can visualize their high-dimensional data without the need to install and run TensorFlow.
The data needed to train machine learning systems comes in a form that computers don't immediately understand. To translate the things we understand naturally (e.g. words, sounds, or videos) to a form that the algorithms can process, we use embeddings, a mathematical vector representation that captures different facets (dimensions) of the data. For example, in this language embedding, similar words are mapped to points that are close to each other.
With the Embedding Projector, you can navigate through views of data in either a 2D or a 3D mode, zooming, rotating, and panning using natural click-and-drag gestures. Below is a figure showing the nearest points to the embedding for the word “important” after training a TensorFlow model using the word2vec tutorial. Clicking on any point (which represents the learned embedding for a given word) in this visualization, brings up a list of nearest points and distances, which shows which words the algorithm has learned to be semantically related. This type of interaction represents an important way in which one can explore how an algorithm is performing.
Methods of Dimensionality Reduction
The Embedding Projector offers three commonly used methods of data dimensionality reduction, which allow easier visualization of complex data: PCA, t-SNE and custom linear projections. PCA is often effective at exploring the internal structure of the embeddings, revealing the most influential dimensions in the data. t-SNE, on the other hand, is useful for exploring local neighborhoods and finding clusters, allowing developers to make sure that an embedding preserves the meaning in the data (e.g. in the MNIST dataset, seeing that the same digits are clustered together). Finally, custom linear projections can help discover meaningful "directions" in data sets - such as the distinction between a formal and casual tone in a language generation model - which would allow the design of more adaptable ML systems.
The Embedding Projector website includes a few datasets to play with. We’ve also made it easy for users to publish and share their embeddings with others (just click on the “Publish” button on the left pane). It is our hope that the Embedding Projector will be a useful tool to help the research community explore and refine their ML applications, as well as enable anyone to better understand how ML algorithms interpret data. Have fun exploring the world of embeddings!
|A custom linear projection of the 100 nearest points of "See attachments." onto the "yes" - "yeah" vector (“yes” is right, “yeah” is left) of a corpus of 35k frequently used phrases in emails|
- NIPS 2016 & Research at Google
Posted by Doug Eck, Research Scientist, Google Brain Team
This week, Barcelona hosts the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), a machine learning and computational neuroscience conference that includes invited talks, demonstrations and oral and poster presentations of some of the latest in machine learning research. Google will have a strong presence at NIPS 2016, with over 280 Googlers attending in order to contribute to and learn from the broader academic research community by presenting technical talks and posters, in addition to hosting workshops and tutorials.
Research at Google is at the forefront of innovation in Machine Intelligence, actively exploring virtually all aspects of machine learning including classical algorithms as well as cutting-edge techniques such as deep learning. Focusing on both theory as well as application, much of our work on language understanding, speech, translation, visual processing, ranking, and prediction relies on Machine Intelligence. In all of those tasks and many others, we gather large volumes of direct or indirect evidence of relationships of interest, and develop learning approaches to understand and generalize.
If you are attending NIPS 2016, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for billions of people, and to see demonstrations of some of the exciting research we pursue. You can also learn more about our work being presented at NIPS 2016 in the list below (Googlers highlighted in blue).
Google is a Platinum Sponsor of NIPS 2016.
Executive Board includes: Corinna Cortes, Fernando Pereira
Advisory Board includes: John C. Platt
Area Chairs include: John Shlens, Moritz Hardt, Navdeep Jaitly, Hugo Larochelle, Honglak Lee, Sanjiv Kumar, Gal Chechik
Dynamic Legged Robots
Boosting with Abstention
Corinna Cortes, Giulia DeSalvo, Mehryar Mohri
Community Detection on Evolving Graphs
Stefano Leonardi, Aris Anagnostopoulos, Jakub Łącki, Silvio Lattanzi, Mohammad Mahdian
Linear Relaxations for Finding Diverse Elements in Metric Spaces
Aditya Bhaskara, Mehrdad Ghadiri, Vahab Mirrokni, Ola Svensson
Nearly Isometric Embedding by Relaxation
James McQueen, Marina Meila, Dominique Joncas
Optimistic Bandit Convex Optimization
Mehryar Mohri, Scott Yang
Reward Augmented Maximum Likelihood for Neural Structured Prediction
Mohammad Norouzi, Samy Bengio, Zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans
Stochastic Gradient MCMC with Stale Gradients
Changyou Chen, Nan Ding, Chunyuan Li, Yizhe Zhang, Lawrence Carin
Unsupervised Learning for Physical Interaction through Video Prediction
Chelsea Finn*, Ian Goodfellow, Sergey Levine
Using Fast Weights to Attend to the Recent Past
Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Leibo, Catalin Ionescu
A Credit Assignment Compiler for Joint Prediction
Kai-Wei Chang, He He, Stephane Ross, Hal III
A Neural Transducer
Navdeep Jaitly, Quoc Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, Geoffrey Hinton
Bi-Objective Online Matching and Submodular Allocations
Hossein Esfandiari, Nitish Korula, Vahab Mirrokni
Combinatorial Energy Learning for Image Segmentation
Jeremy Maitin-Shepard, Viren Jain, Michal Januszewski, Peter Li, Pieter Abbeel
Deep Learning Games
Dale Schuurmans, Martin Zinkevich
DeepMath - Deep Sequence Models for Premise Selection
Geoffrey Irving, Christian Szegedy, Niklas Een, Alexander Alemi, François Chollet, Josef Urban
Density Estimation via Discrepancy Based Adaptive Sequential Partition
Dangna Li, Kun Yang, Wing Wong
Domain Separation Networks
Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, Dumitru Erhan
Fast Distributed Submodular Cover: Public-Private Data Summarization
Baharan Mirzasoleiman, Morteza Zadimoghaddam, Amin Karbasi
Satisfying Real-world Goals with Dataset Constraints
Gabriel Goh, Andrew Cotter, Maya Gupta, Michael P Friedlander
Can Active Memory Replace Attention?
Łukasz Kaiser, Samy Bengio
Fast and Flexible Monotonic Functions with Ensembles of Lattices
Kevin Canini, Andy Cotter, Maya Gupta, Mahdi Fard, Jan Pfeifer
Launch and Iterate: Reducing Prediction Churn
Quentin Cormier, Mahdi Fard, Kevin Canini, Maya Gupta
On Mixtures of Markov Chains
Rishi Gupta, Ravi Kumar, Sergei Vassilvitskii
Orthogonal Random Features
Felix Xinnan Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Dan Holtmann-Rice,
Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D
Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, Honglak Lee
Structured Prediction Theory Based on Factor Graph Complexity
Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang
Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity
Amit Daniely, Roy Frostig, Yoram Singer
Interactive musical improvisation with Magenta
Adam Roberts, Sageev Oore, Curtis Hawthorne, Douglas Eck
Content-based Related Video Recommendation
Workshops, Tutorials and Symposia
Advances in Approximate Bayesian Inference
Advisory Committee includes: Kevin P. Murphy
Invited Speakers include: Matt Johnson
Panelists include: Ryan Sepassi
Accepted Authors: Luke Metz, Ben Poole, David Pfau, Jascha Sohl-Dickstein, Augustus Odena, Christopher Olah, Jonathon Shlens
Bayesian Deep Learning
Organizers include: Kevin P. Murphy
Accepted Authors include: Rif A. Saurous, Eugene Brevdo, Kevin Murphy, Eric Jang, Shixiang Gu, Ben Poole
Brains & Bits: Neuroscience Meets Machine Learning
Organizers include: Jascha Sohl-Dickstein
Connectomics II: Opportunities & Challanges for Machine Learning
Organizers include: Viren Jain
Constructive Machine Learning
Invited Speakers include: Douglas Eck
Continual Learning & Deep Networks
Invited Speakers include: Honglak Lee
Deep Learning for Action & Interaction
Organizers include: Sergey Levine
Invited Speakers include: Honglak Lee
Accepted Authors include: Pararth Shah, Dilek Hakkani-Tur, Larry Heck
End-to-end Learning for Speech and Audio Processing
Invited Speakers include: Tara Sainath
Accepted Authors include: Brian Patton, Yannis Agiomyrgiannakis, Michael Terry, Kevin Wilson, Rif A. Saurous, D. Sculley
Extreme Classification: Multi-class & Multi-label Learning in Extremely Large Label Spaces
Organizers include: Samy Bengio
Interpretable Machine Learning for Complex Systems
Invited Speaker: Honglak Lee
Accepted Authors include: Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda Viegas, Martin Wattenberg
Large Scale Computer Vision Systems
Organizers include: Gal Chechik
Machine Learning Systems
Invited Speakers include: Jeff Dean
Nonconvex Optimization for Machine Learning: Theory & Practice
Organizers include: Hossein Mobahi
Optimizing the Optimizers
Organizers include: Alex Davies
Reliable Machine Learning in the Wild
Accepted Authors: Andres Medina, Sergei Vassilvitskii
The Future of Gradient-Based Machine Learning Software
Invited Speakers: Jeff Dean, Matt Johnson
Time Series Workshop
Organizers include: Vitaly Kuznetsov
Invited Speakers include: Mehryar Mohri
Theory and Algorithms for Forecasting Non-Stationary Time Series
Tutorial Organizers: Vitaly Kuznetsov, Mehryar Mohri
Women in Machine Learning
Invited Speakers include: Maya Gupta
* Work done as part of the Google Brain team ↩
- Deep Learning for Detection of Diabetic Eye Disease
Posted by Lily Peng MD PhD, Product Manager and Varun Gulshan PhD, Research Engineer
Diabetic retinopathy (DR) is the fastest growing cause of blindness, with nearly 415 million diabetic patients at risk worldwide. If caught early, the disease can be treated; if not, it can lead to irreversible blindness. Unfortunately, medical specialists capable of detecting the disease are not available in many parts of the world where diabetes is prevalent. We believe that Machine Learning can help doctors identify patients in need, particularly among underserved populations.
A few years ago, several of us began wondering if there was a way Google technologies could improve the DR screening process, specifically by taking advantage of recent advances in Machine Learning and Computer Vision. In "Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs", published today in JAMA, we present a deep learning algorithm capable of interpreting signs of DR in retinal photographs, potentially helping doctors screen more patients in settings with limited resources.
One of the most common ways to detect diabetic eye disease is to have a specialist examine pictures of the back of the eye (Figure 1) and rate them for disease presence and severity. Severity is determined by the type of lesions present (e.g. microaneurysms, hemorrhages, hard exudates, etc), which are indicative of bleeding and fluid leakage in the eye. Interpreting these photographs requires specialized training, and in many regions of the world there aren’t enough qualified graders to screen everyone who is at risk.
Working closely with doctors both in India and the US, we created a development dataset of 128,000 images which were each evaluated by 3-7 ophthalmologists from a panel of 54 ophthalmologists. This dataset was used to train a deep neural network to detect referable diabetic retinopathy. We then tested the algorithm’s performance on two separate clinical validation sets totalling ~12,000 images, with the majority decision of a panel 7 or 8 U.S. board-certified ophthalmologists serving as the reference standard. The ophthalmologists selected for the validation sets were the ones that showed high consistency from the original group of 54 doctors.
|Figure 1. Examples of retinal fundus photographs that are taken to screen for DR. The image on the left is of a healthy retina (A), whereas the image on the right is a retina with referable diabetic retinopathy (B) due a number of hemorrhages (red spots) present.|
Performance of both the algorithm and the ophthalmologists on a 9,963-image validation set are shown in Figure 2.
The results show that our algorithm’s performance is on-par with that of ophthalmologists. For example, on the validation set described in Figure 2, the algorithm has a F-score (combined sensitivity and specificity metric, with max=1) of 0.95, which is slightly better than the median F-score of the 8 ophthalmologists we consulted (measured at 0.91).
|Figure 2. Performance of the algorithm (black curve) and eight ophthalmologists (colored dots) for the presence of referable diabetic retinopathy (moderate or worse diabetic retinopathy or referable diabetic macular edema) on a validation set consisting of 9963 images. The black diamonds on the graph correspond to the sensitivity and specificity of the algorithm at the high sensitivity and high specificity operating points.|
These are exciting results, but there is still a lot of work to do. First, while the conventional quality measures we used to assess our algorithm are encouraging, we are working with retinal specialists to define even more robust reference standards that can be used to quantify performance. Furthermore, interpretation of a 2D fundus photograph, which we demonstrate in this paper, is only one part in a multi-step process that leads to a diagnosis for diabetic eye disease. In some cases, doctors use a 3D imaging technology, Optical Coherence Tomography (OCT), to examine various layers of a retina in detail. Applying machine learning to this 3D imaging modality is already underway, led by our colleagues at DeepMind. In the future, these two complementary methods might be used together to assist doctors in the diagnosis of a wide spectrum of eye diseases.
Automated DR screening methods with high accuracy have the strong potential to assist doctors in evaluating more patients and quickly routing those who need help to a specialist. We are working with doctors and researchers to study the entire process of screening in settings around the world, in the hopes that we can integrate our methods into clinical workflow in a manner that is maximally beneficial. Finally, we are working with the FDA and other regulatory agencies to further evaluate these technologies in clinical studies.
Given the many recent advances in deep learning, we hope our study will be just one of many compelling examples to come demonstrating the ability of machine learning to help solve important problems in medical imaging in healthcare more broadly.
Learn more about the Health Research efforts of the Brain team at Google
- Zero-Shot Translation with Google’s Multilingual Neural Machine Translation System
Posted by Mike Schuster (Google Brain Team), Melvin Johnson (Google Translate) and Nikhil Thorat (Google Brain Team)
In the last 10 years, Google Translate has grown from supporting just a few languages to 103, translating over 140 billion words every day. To make this possible, we needed to build and maintain many different systems in order to translate between any two languages, incurring significant computational cost. With neural networks reforming many fields, we were convinced we could raise the translation quality further, but doing so would mean rethinking the technology behind Google Translate.
In September, we announced that Google Translate is switching to a new system called Google Neural Machine Translation (GNMT), an end-to-end learning framework that learns from millions of examples, and provided significant improvements in translation quality. However, while switching to GNMT improved the quality for the languages we tested it on, scaling up to all the 103 supported languages presented a significant challenge.
In “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”, we address this challenge by extending our previous GNMT system, allowing for a single system to translate between multiple languages. Our proposed architecture requires no change in the base GNMT system, but instead uses an additional “token” at the beginning of the input sentence to specify the required target language to translate to. In addition to improving translation quality, our method also enables “Zero-Shot Translation” — translation between language pairs never seen explicitly by the system.
Here’s how it works. Let’s say we train a multilingual system with Japanese⇄English and Korean⇄English examples, shown by the solid blue lines in the animation. Our multilingual system, with the same size as a single GNMT system, shares its parameters to translate between these four different language pairs. This sharing enables the system to transfer the “translation knowledge” from one language pair to the others. This transfer learning and the need to translate between multiple languages forces the system to better use its modeling power.
This inspired us to ask the following question: Can we translate between a language pair which the system has never seen before? An example of this would be translations between Korean and Japanese where Korean⇄Japanese examples were not shown to the system. Impressively, the answer is yes — it can generate reasonable Korean⇄Japanese translations, even though it has never been taught to do so. We call this “zero-shot” translation, shown by the yellow dotted lines in the animation. To the best of our knowledge, this is the first time this type of transfer learning has worked in Machine Translation.
The success of the zero-shot translation raises another important question: Is the system learning a common representation in which sentences with the same meaning are represented in similar ways regardless of language — i.e. an “interlingua”? Using a 3-dimensional representation of internal network data, we were able to take a peek into the system as it translates a set of sentences between all possible pairs of the Japanese, Korean, and English languages.
Part (a) from the figure above shows an overall geometry of these translations. The points in this view are colored by the meaning; a sentence translated from English to Korean with the same meaning as a sentence translated from Japanese to English share the same color. From this view we can see distinct groupings of points, each with their own color. Part (b) zooms in to one of the groups, and part (c) colors by the source language. Within a single group, we see a sentence with the same meaning but from three different languages. This means the network must be encoding something about the semantics of the sentence rather than simply memorizing phrase-to-phrase translations. We interpret this as a sign of existence of an interlingua in the network.
We show many more results and analyses in our paper, and hope that its findings are not only interesting for machine learning or machine translation researchers but also to linguists and others who are interested in how multiple languages can be processed by machines using a single system.
Finally, the described Multilingual Google Neural Machine Translation system is running in production today for all Google Translate users. Multilingual systems are currently used to serve 10 of the recently launched 16 language pairs, resulting in improved quality and a simplified production architecture.
- Enhance! RAISR Sharp Images with Machine Learning
Posted by Peyman Milanfar, Research Scientist
Everyday the web is used to share and store millions of pictures, enabling one to explore the world, research new topics of interest, or even share a vacation with friends and family. However, many of these images are either limited by the resolution of the device used to take the picture, or purposely degraded in order to accommodate the constraints of cell phones, tablets, or the networks to which they are connected. With the ubiquity of high-resolution displays for home and mobile devices, the demand for high-quality versions of low-resolution images, quickly viewable and shareable from a wide variety of devices, has never been greater.
With “RAISR: Rapid and Accurate Image Super-Resolution”, we introduce a technique that incorporates machine learning in order to produce high-quality versions of low-resolution images. RAISR produces results that are comparable to or better than the currently available super-resolution methods, and does so roughly 10 to 100 times faster, allowing it to be run on a typical mobile device in real-time. Furthermore, our technique is able to avoid recreating the aliasing artifacts that may exist in the lower resolution image.
Upsampling, the process of producing an image of larger size with significantly more pixels and higher image quality from a low quality image, has been around for quite a while. Well-known approaches to upsampling are linear methods which fill in new pixel values using simple, and fixed, combinations of the nearby existing pixel values. These methods are fast because they are fixed linear filters (a constant convolution kernel applied uniformly across the image). But what makes these upsampling methods fast, also makes them ineffective in bringing out vivid details in the higher resolution results. As you can see in the example below, the upsampled image looks blurry – one would hesitate to call it enhanced.
With RAISR, we instead use machine learning and train on pairs of images, one low quality, one high, to find filters that, when applied to selectively to each pixel of the low-res image, will recreate details that are of comparable quality to the original. RAISR can be trained in two ways. The first is the "direct" method, where filters are learned directly from low and high-resolution image pairs. The other method involves first applying a computationally cheap upsampler to the low resolution image (as in the figure above) and then learning the filters from the upsampled and high resolution image pairs. While the direct method is computationally faster, the 2nd method allows for non-integer scale factors and better leveraging of hardware-based upsampling.
For either method, RAISR filters are trained according to edge features found in small patches of images, - brightness/color gradients, flat/textured regions, etc. - characterized by direction (the angle of an edge), strength (sharp edges have a greater strength) and coherence (a measure of how directional the edge is). Below is a set of RAISR filters, learned from a database of 10,000 high and low resolution image pairs (where the low-res images were first upsampled). The training process takes about an hour.
|Collection of learned 11x11 filters for 3x super-resolution. Filters can be learned for a range of super-resolution factors, including fractional ones. Note that as the angle of the edge changes, we see the angle of the filter rotate as well. Similarly, as the strength increases, the sharpness of the filters increases, and the anisotropy of the filter increases with rising coherence.|
From left to right, we see that the learned filters correspond selectively to the direction of the underlying edge that is being reconstructed. For example, the filter in the middle of the bottom row is most appropriate for a strong horizontal edge (gradient angle of 90 degrees) with a high degree of coherence (a straight, rather than a curved, edge). If this same horizontal edge is low-contrast, then a different filter is selected such one in the top row.
In practice, at run-time RAISR selects and applies the most relevant filter from the list of learned filters to each pixel neighborhood in the low-resolution image. When these filters are applied to the lower quality image, they recreate details that are of comparable quality to the original high resolution, and offer a significant improvement to linear, bicubic, or Lanczos interpolation methods.
|Top: RAISR algorithm at run-time, applied to a cheap upscaler’s output. Bottom: Low-res original (left), bicubic upsampler 2x (middle), RAISR output (right)|
Some examples of RAISR in action can be seen below:
|Left: Original, Right: RAISR super-resolved 3x. Image courtesy of Marc Levoy|
One of the more complex aspects of super-resolution is getting rid of aliasing artifacts such as Moire patterns and jaggies that arise when high frequency content is rendered in lower resolution (as is the case when images are purposefully degraded). Depending on the shape of the underlying features, these artifacts can be varied and hard to undo.
|Example of aliasing artifacts seen on the lower right (Image source)|
Linear methods simply can not recover the underlying structure, but RAISR can. Below is an example where the aliased spatial frequencies are apparent under the numbers 3 and 5 in the low-resolution original on the left, while the RAISR image on the right recovered the original structure. Another important advantage of the filter learning approach used by RAISR is that we can specialize it to remove noise, or compression artifacts unique to individual compression algorithms (such as JPEG) as part of the training process. By providing it with examples of such artifacts, RAISR can learn to undo other effects besides resolution enhancement, having them “baked” inside the resulting filters.
|Left: Low res original, with strong aliasing. Right: RAISR output, removing aliasing.|
Super-resolution technology, using one or many frames, has come a long way. Today, the use of machine learning, in tandem with decades of advances in imaging technology, has enabled progress in image processing that yields many potential benefits. For example, in addition to improving digital “pinch to zoom” on your phone, one could capture, save, or transmit images at lower resolution and super-resolve on demand without any visible degradation in quality, all while utilizing less of mobile data and storage plans.
To learn more about the details of our research and a comparison to other current architectures, check out our paper, which will appear soon in the IEEE Transactions on Computational Imaging.