- DeepDream - a code example for visualizing Neural Networks
Posted by Alexander Mordvintsev, Software Engineer, Christopher Olah, Software Engineering Intern and Mike Tyka, Software Engineer
Two weeks ago we blogged about a visualization tool designed to help us understand how neural networks work and what each layer has learned. In addition to gaining some insight on how these networks carry out classification tasks, we found that this process also generated some beautiful art.
We have seen a lot of interest and received some great questions, from programmers and artists alike, about the details of how these visualizations are made. We have decided to open source the code we used to generate these images in an IPython notebook, so now you can make neural network inspired images yourself!
The code is based on Caffe and uses available open source packages, and is designed to have as few dependencies as possible. To get started, you will need the following (full details in the notebook):
Once you’re set up, you can supply an image and choose which layers in the network to enhance, how many iterations to apply and how far to zoom in. Alternatively, different pre-trained networks can be plugged in.
It'll be interesting to see what imagery people are able to generate. If you post images to Google+, Facebook, or Twitter, be sure to tag them with #deepdream so other researchers can check them out too.
- Google Computational Journalism Research Awards launch in Europe
Posted by Andrea Held, Google University Relations & Matt Cooke, Google News Lab Europe
Journalism is evolving fast in the digital age, and researchers across Europe are working on exciting projects to create innovative new tools and open source software that will support online journalism and benefit readers. As part of the wider Google Digital News Initiative (DNI), we invited academic researchers across Europe to submit proposals for the Computational Journalism Research Awards.
After careful review by Google’s News Lab and Research teams, the following projects were selected:
SCAN: Systematic Content Analysis of User Comments for Journalists
Walid Maalej, Professor of Informatics, University of Hamburg
Wiebke Loosen, Senior Researcher for Journalism, Hans-Bredow-Institute, Hamburg, Germany
This project aims at developing a framework for the systematic, semi-automated analysis of audience feedback on journalistic content to better reflect the voice of users, mitigate the analysis efforts, and help journalists generate new content from the user comments.
Event Thread Extraction for Viewpoint Analysis
Ioana Manolescu, Senior Researcher, INRIA Saclay, France
Xavier Tannier, Professor of Computer Science, University Paris-Sud, France
The goal of the project is to automatically build topic "event threads" that will help journalists and citizens decode claims made by public figures, in order to distinguish between personal opinion, communication tools and voluntary distortions of the reality.
Computational Support for Creative Story Development by Journalists
Neil Maiden, Professor of Systems Engineering
George Brock, Professor of Journalism, City University London, UK
This project will develop a new software prototype to implement creative search strategies that journalists could use to strengthen investigative storytelling more efficiently than with current news content management and search tools.
We congratulate the recipients of these awards and we look forward to the results of their research. Each award includes funding of up to $60,000 in cash and $20,000 in computing credits on Google’s Cloud Platform. Stay tuned for updates on their progress.
- Inceptionism: Going Deeper into Neural Networks
Posted by Alexander Mordvintsev, Software Engineer, Christopher Olah, Software Engineering Intern and Mike Tyka, Software Engineer
Artificial Neural Networks have spurred remarkable recent progress in image classification and speech recognition. But even though these are very useful tools based on well-known mathematical methods, we actually understand surprisingly little of why certain models work and others don’t. So let’s take a look at some simple techniques for peeking inside these networks.
We train an artificial neural network by showing it millions of training examples and gradually adjusting the network parameters until it gives the classifications we want. The network typically consists of 10-30 stacked layers of artificial neurons. Each image is fed into the input layer, which then talks to the next layer, until eventually the “output” layer is reached. The network’s “answer” comes from this final output layer.
One of the challenges of neural networks is understanding what exactly goes on at each layer. We know that after training, each layer progressively extracts higher and higher-level features of the image, until the final layer essentially makes a decision on what the image shows. For example, the first layer maybe looks for edges or corners. Intermediate layers interpret the basic features to look for overall shapes or components, like a door or a leaf. The final few layers assemble those into complete interpretations—these neurons activate in response to very complex things such as entire buildings or trees.
One way to visualize what goes on is to turn the network upside down and ask it to enhance an input image in such a way as to elicit a particular interpretation. Say you want to know what sort of image would result in “Banana.” Start with an image full of random noise, then gradually tweak the image towards what the neural net considers a banana (see related work in , , , ). By itself, that doesn’t work very well, but it does if we impose a prior constraint that the image should have similar statistics to natural images, such as neighboring pixels needing to be correlated.
So here’s one surprise: neural networks that were trained to discriminate between different kinds of images have quite a bit of the information needed to generate images too. Check out some more examples across different classes:
Why is this important? Well, we train networks by simply showing them many examples of what we want them to learn, hoping they extract the essence of the matter at hand (e.g., a fork needs a handle and 2-4 tines), and learn to ignore what doesn’t matter (a fork can be any shape, size, color or orientation). But how do you check that the network has correctly learned the right features? It can help to visualize the network’s representation of a fork.
Indeed, in some cases, this reveals that the neural net isn’t quite looking for the thing we thought it was. For example, here’s what one neural net we designed thought dumbbells looked like:
There are dumbbells in there alright, but it seems no picture of a dumbbell is complete without a muscular weightlifter there to lift them. In this case, the network failed to completely distill the essence of a dumbbell. Maybe it’s never been shown a dumbbell without an arm holding it. Visualization can help us correct these kinds of training mishaps.
Instead of exactly prescribing which feature we want the network to amplify, we can also let the network make that decision. In this case we simply feed the network an arbitrary image or photo and let the network analyze the picture. We then pick a layer and ask the network to enhance whatever it detected. Each layer of the network deals with features at a different level of abstraction, so the complexity of features we generate depends on which layer we choose to enhance. For example, lower layers tend to produce strokes or simple ornament-like patterns, because those layers are sensitive to basic features such as edges and their orientations.
|Left: Original photo by Zachi Evenor. Right: processed by Günther Noack, Software Engineer|
If we choose higher-level layers, which identify more sophisticated features in images, complex features or even whole objects tend to emerge. Again, we just start with an existing image and give it to our neural net. We ask the network: “Whatever you see there, I want more of it!” This creates a feedback loop: if a cloud looks a little bit like a bird, the network will make it look more like a bird. This in turn will make the network recognize the bird even more strongly on the next pass and so forth, until a highly detailed bird appears, seemingly out of nowhere.
|Left: Original painting by Georges Seurat. Right: processed images by Matthew McNaughton, Software Engineer|
The results are intriguing—even a relatively simple neural network can be used to over-interpret an image, just like as children we enjoyed watching clouds and interpreting the random shapes. This network was trained mostly on images of animals, so naturally it tends to interpret shapes as animals. But because the data is stored at such a high abstraction, the results are an interesting remix of these learned features.
Of course, we can do more than cloud watching with this technique. We can apply it to any kind of image. The results vary quite a bit with the kind of image, because the features that are entered bias the network towards certain interpretations. For example, horizon lines tend to get filled with towers and pagodas. Rocks and trees turn into buildings. Birds and insects appear in images of leaves.
This technique gives us a qualitative sense of the level of abstraction that a particular layer has achieved in its understanding of images. We call this technique “Inceptionism” in reference to the neural net architecture used. See our Inceptionism gallery for more pairs of images and their processed results, plus some cool video animations.
|The original image influences what kind of objects form in the processed image.|
We must go deeper: Iterations
If we apply the algorithm iteratively on its own outputs and apply some zooming after each iteration, we get an endless stream of new impressions, exploring the set of things the network knows about. We can even start this process from a random-noise image, so that the result becomes purely the result of the neural network, as seen in the following images:
The techniques presented here help us understand and visualize how neural networks are able to carry out difficult classification tasks, improve network architecture, and check what the network has learned during training. It also makes us wonder whether neural networks could become a tool for artists—a new way to remix visual concepts—or perhaps even shed a little light on the roots of the creative process in general.
- New ways to add Reminders in Inbox by Gmail
Posted by Dave Orr, Google Research Product Manager
Last week, Inbox by Gmail opened up and improved many of your favorite features, including two new ways to add Reminders.
First up, when someone emails you a to-do, Inbox can now suggest adding a Reminder so you don’t forget. Here's how it looks if your spouse emails you and asks you to buy milk on the way home:
To help you add Reminders, the Google Research team used natural language understanding technology to teach Inbox to recognize to-dos in email.
And much like Gmail and Inbox get better when you report spam, your feedback helps improve these suggested Reminders. You can accept or reject them with a single click:
The other new way to add Reminders in Inbox is to create Reminders in Google Keep--they will appear in Inbox with a link back to the full note in Google Keep.
Hopefully, this little extra help gets you back to what matters more quickly and easily. Try the new features out, and as always, let us know what you think using the feedback link in the app.
- Google Computer Vision research at CVPR 2015
Posted by Vincent Vanhoucke, Google Research Scientist
Much of the world's data is in the form of visual media. In order to utilize meaningful information from multimedia and deliver innovative products, such as Google Photos, Google builds machine-learning systems that are designed to enable computer perception of visual input, in addition to pursuing image and video analysis techniques focused on image/scene reconstruction and understanding.
This week, Boston hosts the 2015 Conference on Computer Vision and Pattern Recognition (CVPR 2015), the premier annual computer vision event comprising the main CVPR conference and several co-located workshops and short courses. As a leader in computer vision research, Google will have a strong presence at CVPR 2015, with many Googlers presenting publications in addition to hosting workshops and tutorials on topics covering image/video annotation and enhancement, 3D analysis and processing, development of semantic similarity measures for visual objects, synthesis of meaningful composites for visualization/browsing of large image/video collections and more.
Learn more about some of our research in the list below (Googlers highlighted in blue). If you are attending CVPR this year, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for hundreds of millions of people. Members of the Jump team will also have a prototype of the camera on display and will be showing videos produced using the Jump system on Google Cardboard.
Applied Deep Learning for Computer Vision with Torch
Koray Kavukcuoglu, Ronan Collobert, Soumith Chintala
DIY Deep Learning: a Hands-On Tutorial with Caffe
Evan Shelhamer, Jeff Donahue, Yangqing Jia, Jonathan Long, Ross Girshick
ImageNet Large Scale Visual Recognition Challenge Tutorial
Olga Russakovsky, Jonathan Krause, Karen Simonyan, Yangqing Jia, Jia Deng, Alex Berg, Fei-Fei Li
Fast Image Processing With Halide
Jonathan Ragan-Kelley, Andrew Adams, Fredo Durand
Open Source Structure-from-Motion
Matt Leotta, Sameer Agarwal, Frank Dellaert, Pierre Moulon, Vincent Rabaud
Modeling Local and Global Deformations in Deep Learning: Epitomic Convolution, Multiple Instance Learning, and Sliding Window Detection
George Papandreou, Iasonas Kokkinos, Pierre-André Savalle
Going Deeper with Convolutions
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich
DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time
Richard A. Newcombe, Dieter Fox, Steven M. Seitz
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell
Visual Vibrometry: Estimating Material Properties from Small Motion in Video
Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Frédo Durand, William T. Freeman
Fast Bilateral-Space Stereo for Synthetic Defocus
Jonathan T. Barron, Andrew Adams, YiChang Shih, Carlos Hernández
Learning Semantic Relationships for Better Action Retrieval in Images
Vignesh Ramanathan, Congcong Li, Jia Deng, Wei Han, Zhen Li, Kunlong Gu, Yang Song, Samy Bengio, Charles Rosenberg, Li Fei-Fei
FaceNet: A Unified Embedding for Face Recognition and Clustering
Florian Schroff, Dmitry Kalenichenko, James Philbin
A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions
Kuan-Chuan Peng, Tsuhan Chen, Amir Sadovnik, Andrew C. Gallagher
Best-Buddies Similarity for Robust Template Matching
Tali Dekel, Shaul Oron, Michael Rubinstein, Shai Avidan, William T. Freeman
Articulated Motion Discovery Using Pairs of Trajectories
Luca Del Pero, Susanna Ricco, Rahul Sukthankar, Vittorio Ferrari
Reflection Removal Using Ghosting Cues
YiChang Shih, Dilip Krishnan, Frédo Durand, William T. Freeman
P3.5P: Pose Estimation with Unknown Focal Length
MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching
Xufeng Han, Thomas Leung, Yangqing Jia, Rahul Sukthankar, Alexander C. Berg
Inferring 3D Layout of Building Facades from a Single Image
Jiyan Pan, Martial Hebert, Takeo Kanade
The Aperture Problem for Refractive Motion
Tianfan Xue, Hossein Mobahei, Frédo Durand, William T. Freeman
Video Magnification in Presence of Large Motions
Mohamed Elgharib, Mohamed Hefeeda, Frédo Durand, William T. Freeman
Robust Video Segment Proposals with Painless Occlusion Handling
Zhengyang Wu, Fuxin Li, Rahul Sukthankar, James M. Rehg
Ontological Supervision for Fine Grained Classification of Street View Storefronts
Yair Movshovitz-Attias, Qian Yu, Martin C. Stumpe, Vinay Shet, Sacha Arnoud, Liron Yatziv
VIP: Finding Important People in Images
Clint Solomon Mathialagan, Andrew C. Gallagher, Dhruv Batra
Fusing Subcategory Probabilities for Texture Classification
Yang Song, Weidong Cai, Qing Li, Fan Zhang
Beyond Short Snippets: Deep Networks for Video Classification
Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici
THUMOS Challenge 2015
Program organizers include: Alexander Gorban, Rahul Sukthankar
DeepVision: Deep Learning in Computer Vision 2015
Invited Speaker: Rahul Sukthankar
Large Scale Visual Commerce (LSVisCom)
Panelist: Luc Vincent
Large-Scale Video Search and Mining (LSVSM)
Invited Speaker and Panelist: Rahul Sukthankar
Program Committee includes: Apostol Natsev
Vision meets Cognition: Functionality, Physics, Intentionality and Causality
Program Organizers include: Peter Battaglia
Big Data Meets Computer Vision: 3rd International Workshop on Large Scale Visual Recognition and Retrieval (BigVision 2015)
Program Organizers include: Samy Bengio
Includes speaker Christian Szegedy - “Scalable approaches for large scale vision”
Observing and Understanding Hands in Action (Hands 2015)
Program Committee includes: Murphy Stein
Fine-Grained Visual Categorization (FGVC3)
Program Organizers include: Anelia Angelova
Large-scale Scene Understanding Challenge (LSUN)
Winners of the Scene Classification Challenge: Julian Ibarz, Christian Szegedy and Vincent Vanhoucke
Winners of the Caption Generation Challenge: Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan
Looking from above: when Earth observation meets vision (EARTHVISION)
Technical Committee includes: Andreas Wendel
Computer Vision in Vehicle Technology: Assisted Driving, Exploration Rovers, Aerial and Underwater Vehicles
Invited Speaker: Andreas Wendel
Program Committee includes: Andreas Wendel
Women in Computer Vision (WiCV)
Invited Speaker: Mei Han
ChaLearn Looking at People (sponsor)
Fine-Grained Visual Categorization (FGVC3) (sponsor)