- Academics and the Little Box Challenge
Posted by Maggie Johnson, Director of Education and University Relations
Think shrink! Min it to win it! Smaller is baller! That's what the Little Box Challenge is all about: developing a high power density inverter. It’s a competition presented by Google and the Institute of Electrical and Electronics Engineers Power Electronics Society (IEEE PELS) -- not only a grand engineering challenge, but your chance to make a big impact on the future of renewables and electricity.
With the rise of solar photovoltaic panels, electric vehicles (EV) and large format batteries, we’ve seen a resurgence in the over-a-century-long feud between Thomas Edison’s direct current (DC) and Nikola Tesla’s alternating current (AC). The electric grid and most higher power household and commercial devices use AC; batteries, photovoltaics, and electric vehicles work in DC. So the power electronics that convert between the two -- rectifiers (AC->DC), and inverters (DC->AC) -- are also gaining increased prominence, as well as the DC/DC and AC/AC converters that switch between different voltages or frequencies.
While different flavors of these devices have been around for well over a century, some of them are starting to show their age and limitations versus newer technologies. For example, conventional string inverters have power densities around 0.5-3 Watts/Inch3, and microinverters around 5 Watts/Inch3 -- but lithium ion batteries can now get 4-10 Watt Hours/Inch3. So for a 1-2 hour battery pack, your inverter could end up being bigger than your battery -- a lot to carry around.
Some recent advances may change what’s possible in power electronics. For example, Wide-bandgap (WBG) semiconductors -- such as gallium-nitride (GaN) and silicon-carbide (SiC) -- not only enable higher power densities than conventional silicon-based devices do, but can also convert between DC and AC at higher temperatures, using higher switching frequencies, and with greater efficiency.
But even WBG materials and other new technologies for power electronics run into limits on the power density of inverters. Photovoltaic power and batteries suffer when they see oscillations on their power output and thus require some form of energy storage -- electrolytic capacitors store that energy and bridge the power differential between the DC input and the AC output, but that makes the devices much larger. Household and consumer devices also need to add filters to prevent electromagnetic interference, so that’s even more bulk.
When it comes to shrinking these devices, inverters may have the most potential. And because inverters are so common in household applications, we hope The Little Box Challenge may lead to improvements not only in power density, but also in reliability, efficiency, safety, and cost. Furthermore, it is our hope that some of these advances can also improve the other types of power electronics listed above. If these devices can be made very small, reliable and inexpensive, we could see all kinds of useful applications to the electric grid, consumer devices and beyond, maybe including some we have yet to imagine.
To recognize the role academics have played in pushing the forefront of new technologies, Google has taken a couple of special steps to help them participate:
- Research at Google will provide unrestricted gifts to to academics pursuing the prize. This funding can be used for research equipment and to support students. Visit the Little Box Challenge awards for academics page for more info -- proposals are due September 30, 2014.
- Academics often have trouble getting the latest technology from device manufacturers to tinker on. So Google has reached out to a number of WBG manufacturers who’ve put up dedicated pages detailing their devices. Check out the Little Box Challenge site to get started.
We hope you’ll consider entering, and please tell your colleagues, professors, students and dreamers -- you can print and post these posters on your campus to spread the word.
- Simple is better - Making your web forms easy to use pays off
Posted by Javier Bargas-Avila and Mirjam Seckler, User Experience Research at Google
Imagine yourself filling out a long and cumbersome form on a website to register for a service. After several minutes of filling out fields, coming up with a password, and handling captchas, you click the submit button to encounter your form filled with red error messages. Suddenly the “close tab” button seems much more tempting than before.
Despite the rapid evolution of the Internet, web forms, with their limited and unilateral way of interaction, remain one of the core barriers between users and website owners. Any kind of obstacle or difficulty in filling in online forms can lead to increased frustration by the user, resulting in drop-outs and information loss.
In 2010, a set of 20 guidelines to optimize web forms was published by researchers from the University of Basel in Switzerland, including best practices aimed to improve web forms and reduce frustration, errors and drop-outs. For instance, guideline no. 13 states that if answers are required in a specific format, the imposed rule should communicated in advance; or no. 15 that states that forms should never clear already completed fields after an error occurs.
To investigate the impact of applying these rules, we conducted a study and presented our results at CHI 2014: Designing usable web forms: empirical evaluation of web form improvement guidelines. In the study, we examined a sample of high traffic online forms and rated them based on how well they followed the form guidelines outlined by the 2010 publication. We then selected three different online forms of varying qualities (low, medium and high), and improved them by applying the guidelines, with the high quality form needing less modification than the medium and low quality forms. We then tested both the original and improved forms extensively with 65 participants in a controlled lab environment.
In our study, the modified forms showed significant improvements over the original forms in the time users needed to complete a form, an increase in successful first-trial submissions and higher user satisfaction. As expected, the impact was highest when the original form was of low quality, but even high quality forms showed improved metrics.
Furthermore, user interviews with participants in the study revealed which guidelines were most impactful in improving the forms:
- Format specifications (e.g., requiring a minimum password length) should be stated in the form, prior to submission. The application of this guideline had a large positive impact on user performance, subjective user ratings and was also mentioned frequently in user interviews.
- Error messages must be placed next to the erroneous field and designed in a way users are easily able to fix the problem. Doing this reduced form-filling time and increased subjective ratings.
- Most frequently users mentioned that it was key to be able to tell apart optional and mandatory fields.
|Example Guideline: State format specification in advance|
| Example Guideline :Place error message next to erroneous fields|
Putting field labels above rather than adjacent to the fields in the form led also to improvements in the way users scanned the form. Using eye-tracking technology, our study shows that users needed less number of fixations, less fixation time and fewer saccades before submitting the form for the first time.
|Example Guideline: Distinguish optional and mandatory fields|
From our study, we conclude that optimizing online forms is well worth the resource investment. With easy to implement actions, you can improve your forms, increase the number of successful transactions, and end up with more satisfied users. Google is currently working on implementing these findings on our own forms.
|Scan path for an original and improved form|
We wish to thank our co-authors at the University of Basel, Switzerland for their collaboration in this work: Silvia Heinz, Klaus Opwis and Alexandre Tuch.
- Influential Papers for 2013
Posted by Corinna Cortes and Alfred Spector, Google Research
Googlers across the company actively engage with the scientific community by publishing technical papers, contributing open-source packages, working on standards, introducing new APIs and tools, giving talks and presentations, participating in ongoing technical debates, and much more. Our publications offer technical and algorithmic advances, feature aspects we learn as we develop novel products and services, and shed light on some of the technical challenges we face at Google. Below are some of the especially influential papers co-authored by Googlers in 2013. In the coming weeks we will be offering a more in-depth look at some of these publications.
Online Matching and Ad Allocation, by Aranyak Mehta [Foundations and Trends in Theoretical Computer Science]
Matching is a classic problem with a rich history and a significant impact, both on the theory of algorithms and in practice. There has recently been a surge of interest in the online version of the matching problem, due to its application in the domain of Internet advertising. The theory of online matching and allocation has played a critical role in the design of algorithms for ad allocation. This monograph provides a survey of the key problems and algorithmic techniques in this area, and provides a glimpse into their practical impact.
Fast, Accurate Detection of 100,000 Object Classes on a Single Machine, by Thomas Dean, Mark Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik [Proceedings of IEEE Conference on Computer Vision and Pattern Recognition]
In this paper, we show how to use hash table lookups to replace the dot products in a convolutional filter bank with the number of lookups independent of the number of filters. We apply the technique to evaluate 100,000 deformable-part models requiring over a million (part) filters on multiple scales of a target image in less than 20 seconds using a single multi-core processor with 20GB of RAM.
Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams, by Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, Shivakumar Venkataraman [SIGMOD]
In this paper, we talk about Photon, a geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency. The streams may be unordered or delayed. Photon fully tolerates infrastructure degradation and datacenter-level outages without any manual intervention while joining every event exactly once. Photon is currently deployed in production, processing millions of events per minute at peak with an average end-to-end latency of less than 10 seconds.
Omega: flexible, scalable schedulers for large compute clusters, by Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes [SIGOPS European Conference on Computer Systems (EuroSys)]
Omega addresses the need for increasing scale and speed in cluster schedulers using parallelism, shared state, and lock-free optimistic concurrency control. The paper presents a taxonomy of design approaches and evaluates Omega using simulations driven by Google production workloads.
FFitts Law: Modeling Finger Touch with Fitts' Law, by Xiaojun Bi, Yang Li, Shumin Zhai [Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2013)]
Fitts’ law is a cornerstone of graphical user interface research and evaluation. It can precisely predict cursor movement time given an on screen target’s location and size. In the era of finger-touch based mobile computing world, the conventional form of Fitts’ law loses its power when the targets are often smaller than the finger width. Researchers at Google, Xiaojun Bi, Yang Li, and Shumin Zhai, devised finger Fitts’ law (FFitts law) to fix such a fundamental problem.
Top-k Publish-Subscribe for Social Annotation of News, by Alexander Shraer, Maxim Gurevich, Marcus Fontoura, Vanja Josifovski [Proceedings of the 39th International Conference on Very Large Data Bases]
The paper describes how scalable, low latency content-based publish-subscribe systems can be implemented using inverted indices and modified top-k document retrieval algorithms. The feasibility of this approach is demonstrated in the application of annotating news articles with social updates (such as Google+ posts or tweets). This application is casted as publish-subscribe, where news articles are treated as subscriptions (continuous queries) and social updates as published items with large update frequency.
Ad Click Prediction: a View from the Trenches, by H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, Jeremy Kubica [KDD]
How should one go about making predictions in extremely large scale production systems? We provide a case study for ad click prediction, and illustrate best practices for combining rigorous theory with careful engineering and evaluation. The paper contains a mix of novel algorithms, practical approaches, and some surprising negative results.
Learning kernels using local rademacher complexity, by Corinna Cortes, Marius Kloft, Mehryar Mohri [Advances in Neural Information Processing Systems (NIPS 2013)]
This paper shows how the notion of local Rademacher complexity, which leads to sharp learning guarantees, can be used to derive algorithms for the important problem of learning kernels. It also reports the results of several experiments with these algorithms which yield performance improvements in some challenging tasks.
Efficient Estimation of Word Representations in Vector Space, by Tomas Mikolov, Kai Chen, Greg S. Corrado, Jeffrey Dean [ICLR Workshop 2013]
We describe a simple and speedy method for training vector representations of words. The resulting vectors naturally capture the semantics and syntax of word use, such that simple analogies can be solved with vector arithmetic. For example, the vector difference between 'man' and 'woman' is approximately equal to the difference between 'king' and 'queen', and vector displacements between any given country's name and its capital are aligned. We provide an open source implementation as well as pre trained vector representations at http://word2vec.googlecode.com
Large-Scale Learning with Less RAM via Randomization, by Daniel Golovin, D. Sculley, H. Brendan McMahan, Michael Young [Proceedings of the 30 International Conference on Machine Learning (ICML)]
We show how a simple technique -- using limited precision coefficients and randomized rounding -- can dramatically reduce the RAM needed to train models with online convex optimization methods such as stochastic gradient descent. In addition to demonstrating excellent empirical performance, we provide strong theoretical guarantees.
Source-Side Classifier Preordering for Machine Translation, by Uri Lerner, Slav Petrov [Proc. of EMNLP '13]
When translating from one language to another, it is important to not only choose the correct translation for each word, but to also put the words in the correct word order. In this paper we present a novel approach that uses a syntactic parser and a feature-rich classifier to perform long-distance reordering. We demonstrate significant improvements over alternative approaches on a large number of language pairs.
Natural Language Processing
Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging, by Oscar Tackstrom, Dipanjan Das, Slav Petrov, Ryan McDonald, Joakim Nivre [Transactions of the Association for Computational Linguistics (TACL '13)]
Knowing the parts of speech (verb, noun, etc.) of words is important for many natural language processing applications, such as information extraction and machine translation. Constructing part-of-speech taggers typically requires large amounts of manually annotated data, which is missing in many languages and domains. In this paper, we introduce a method that instead relies on a combination of incomplete annotations projected from English with incomplete crowdsourced dictionaries in each target language. The result is a 25 percent error reduction compared to the previous state of the art.
Universal Dependency Annotation for Multilingual Parsing, by Ryan McDonald, Joakim Nivre, Yoav Goldberg, Yvonne Quirmbach-Brundage, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Tackstrom, Claudia Bedini, Nuria Bertomeu Castello, Jungmee Lee, [Association for Computational Linguistics]
This paper discusses a public release of syntactic dependency treebanks (https://code.google.com/p/uni-dep-tb/). Syntactic treebanks are manually annotated data sets containing full syntactic analysis for a large number of sentences (http://en.wikipedia.org/wiki/Dependency_grammar). Unlike other syntactic treebanks, the universal data set tries to normalize syntactic phenomena across languages when it can to produce a harmonized set of multilingual data. Such a resource will help large scale multilingual text analysis and evaluation.
B4: Experience with a Globally Deployed Software Defined WAN, by Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jonathan Zolla, Urs Hölzle, Stephen Stuart, Amin Vahdat [Proceedings of the ACM SIGCOMM Conference]
This paper presents the motivation, design, and evaluation of B4, a Software Defined WAN for our data center to data center connectivity. We present our approach to separating the network’s control plane from the data plane to enable rapid deployment of new network control services. Our first such service, centralized traffic engineering allocates bandwidth among competing services based on application priority, dynamically shifting communication patterns, and prevailing failure conditions.
When the Cloud Goes Local: The Global Problem with Data Localization, by Patrick Ryan, Sarah Falvey, Ronak Merchant [IEEE Computer]
Ongoing efforts to legally define cloud computing and regulate separate parts of the Internet are unlikely to address underlying concerns about data security and privacy. Data localization initiatives, led primarily by European countries, could actually bring the cloud to the ground and make the Internet less secure.
Cloud-based robot grasping with the google object recognition engine, by Ben Kehoe, Akihiro Matsukawa, Sal Candido, James Kuffner, Ken Goldberg [IEEE Int’l Conf. on Robotics and Automation]
What if robots were not limited by onboard computation, algorithms did not need to be implemented on every class of robot, and model improvements from sensor data could be shared across many robots? With wireless networking and rapidly expanding cloud computing resources this possibility is rapidly becoming reality. We present a system architecture, implemented prototype, and initial experimental data for a cloud-based robot grasping system that incorporates a Willow Garage PR2 robot with onboard color and depth cameras, Google’s proprietary object recognition engine, the Point Cloud Library (PCL) for pose estimation, Columbia University’s GraspIt! toolkit and OpenRAVE for 3D grasping and our prior approach to sampling-based grasp analysis to address uncertainty in pose.
Security, Cryptography, and Privacy
Alice in Warningland: A Large-Scale Field Study of Browser Security Warning Effectiveness, by Devdatta Akhawe, Adrienne Porter Felt [USENIX Security Symposium]
Browsers show security warnings to keep users safe. How well do these warnings work? We empirically assess the effectiveness of browser security warnings, using more than 25 million warning impressions from Google Chrome and Mozilla Firefox.
Arrival and departure dynamics in Social Networks, by Shaomei Wu, Atish Das Sarma, Alex Fabrikant, Silvio Lattanzi, Andrew Tomkins [WSDM]
In this paper, we consider the natural arrival and departure of users in a social network, and show that the dynamics of arrival, which have been studied in some depth, are quite different from the dynamics of departure, which are not as well studied. We show unexpected properties of a node's local neighborhood that are predictive of departure. We also suggest that, globally, nodes at the fringe are more likely to depart, and subsequent departures are correlated among neighboring nodes in tightly-knit communities.
All the news that's fit to read: a study of social annotations for news reading, by Chinmay Kulkarni, Ed H. Chi [In Proc. of CHI2013]
As news reading becomes more social, how do different types of annotations affect people's selection of news articles? This crowdsourcing experiment show that strangers' opinion, unsurprisingly, has no persuasive effects, while surprisingly unknown branded companies still have persuasive effects. What works best are friend annotations, helping users decide what to read, and provide social context that improves engagement.
Does Bug Prediction Support Human Developers? Findings from a Google Case Study, by Chris Lewis, Zhongpeng Lin, Caitlin Sadowski, Xiaoyan Zhu, Rong Ou, E. James Whitehead Jr. [International Conference on Software Engineering (ICSE)]
"Does Bug Prediction Support Human Developers?" was a study that investigated whether software engineers changed their code review habits when presented with information about where bug-prone code might be lurking. Much to our surprise we found out that developer behavior didn't change at all! We went on to suggest features that bug prediction algorithms need in order to fit with developer workflows, which will hopefully result in more supportive algorithms being developed in the future.
Statistical Parametric Speech Synthesis Using Deep Neural Networks, by Heiga Zen, Andrew Senior, Mike Schuster [Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)]
Conventional approaches to statistical parametric speech synthesis use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech given text. This paper examines an alternative scheme in which the mapping from an input text to its acoustic realization is modeled by a deep neural network (DNN). Experimental results show that DNN-based speech synthesizers can produce more natural-sounding speech than conventional HMM-based ones using similar model sizes.
Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices, by Xin Lei, Andrew Senior, Alexander Gruenstein, Jeffrey Sorensen [Interspeech]
In this paper we describe the neural network-based speech recognition system that runs in real-time on android phones. With the neural network acoustic model replacing the previous Gaussian mixture model and a compressed language model using on-the-fly rescoring, the word-error-rate is reduced by 27% while the storage requirement is reduced by 63%
Pay by the Bit: An Information-Theoretic Metric for Collective Human Judgment, by Tamsyn P. Waterhouse [Proc CSCW]
There's a lot of confusion around quality control in crowdsourcing. For the broad problem subtype we call collective judgment, I discovered that information theory provides a natural and elegant metric for the value of contributors' work, in the form of the mutual information between their judgments and the questions' answers, each treated as random variables
Structured Data Management
F1: A Distributed SQL Database That Scales, by Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littleﬁeld, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, Himani Apte [VLDB]
In recent years, conventional wisdom has been that when you need a highly scalable, high throughput data store, the only viable options are NoSQL key/value stores, and you need to work around the lack of transactional consistency, indexes, and SQL. F1 is a hybrid database we built that combines the strengths of traditional relational databases with the scalability of NoSQL systems, showing it's not necessary to compromise on database functionality to achieve scalability and high availability. The paper describes the F1 system, how we use Spanner underneath, and how we've designed schema and applications to hide the increased commit latency inherent in distributed commit protocols.
- 2014 Google PhD Fellowships: Supporting the Future of Computer Science
Posted by David Harper, Google University Relations & Beate List, Google Research Programs
Nurturing and maintaining strong relations with the academic community is a top priority at Google. Today, we’re announcing the 2014 Google PhD Fellowship recipients. These students, recognized for their incredible creativity, knowledge and skills, represent some of the most outstanding graduate researchers in computer science across the globe. We’re excited to support them, and we extend our warmest congratulations.
The Google PhD Fellowship program supports PhD students in computer science or closely related fields and reflects our commitment to building strong relations with the global academic community. Now in its sixth year, the program covers North America, Europe, China, India and Australia. To date we’ve awarded 193 Fellowships in 72 universities across 17 countries.
As we welcome the 2014 PhD Fellows, we hear from two past recipients, Cynthia Liem and Ian Goodfellow. Cynthia studies at the Delft University of Technology, and was awarded a Fellowship in Multimedia. Ian is about to complete his PhD at the Université de Montréal in Québec, and was awarded a Fellowship in Deep Learning. Recently interviewed on the Google Student blog, they expressed their views on how the Fellowship affected their careers.
Cynthia has combined her dual passions of music and computing to pursue a PhD in music information retrieval. She speaks about the fellowship and her links with Google:
“Through the Google European Doctoral Fellowship, I was assigned a Google mentor who works on topics related to my PhD interests. In my case, this was Dr. Douglas Eck in Mountain View, who is part of Google Research and leads a team focusing on music recommendation. Doug has been encouraging me in several of my academic activities, most notably the initiation of the ACM MIRUM Workshop, which managed to successfully bring music retrieval into the spotlight of the prestigious ACM Multimedia conference.”
Ian is about to start as a research scientist on Jeff Dean’s deep learning infrastructure team. He was also an intern at Google, and contributed to the development of a neural network capable of transcribing the address numbers on houses from Google Street View photos. He describes the connection between this intern project and his PhD study supported by the Fellowship:
“The project I worked on during my internship was the basis for a publication at the International Conference on Learning Representations …. my advisor let me include this paper in my PhD thesis since there was a close connection to the subject area.… I can show that some of the work developed early in the thesis has had a real impact.“
We’re proud to have supported Cynthia, Ian, and all the other recipients of the Google PhD Fellowship. We continue to look forward to working with, and learning from, the academic community with great excitement and high expectations.
- A skill-based approach to creating open online courses
Posted by Sean Lip, Software Engineer, Open Online Education
Google has offered a number of open online courses in the past two years, and some of our recent research highlights the importance of having effective and relevant activities in these courses. Over the past decade, the Open Learning Initiative (OLI) at Carnegie Mellon, and now at Stanford, has successfully offered free open online courses that are centered around goal-directed activities that provide students with targeted feedback on their work. In order to improve understanding about how to design online courses based around effective activities, Google and OLI recently collaborated on a white paper that outlines the skill-based approach that OLI uses to create its courses.
OLI courses are focused around a set of learning objectives which identify what students should be able to do by the time they have completed a course module. These learning objectives are broken down into skills, and individual activities in the course are aimed towards developing students’ mastery with these skills. A typical activity from the Engineering Statics course is shown below:
During the course, students’ attempts at questions related to a particular skill are then fed as inputs into a probabilistic model which treats the degrees of mastery for each skill as mathematically independent variables. This model estimates how likely a student is to have mastered individual skills, and its output can help instructors determine which students are struggling and take appropriate interventions, as well as inform the design of future versions of the same course. The paper also outlines the advantages and limitations of the existing system, which could be useful starting points for further research.
We hope that this white paper provides useful insight for creators of online courses and course platforms, and that it stimulates further discussion about how to help people learn online more effectively.