- DeepMind moves to TensorFlow
Posted by Koray Kavukcuoglu, Research Scientist, Google DeepMind
At DeepMind, we conduct state-of-the-art research on a wide range of algorithms, from deep learning and reinforcement learning to systems neuroscience, towards the goal of building Artificial General Intelligence. A key factor in facilitating rapid progress is the software environment used for research. For nearly four years, the open source Torch7 machine learning library has served as our primary research platform, combining excellent flexibility with very fast runtime execution, enabling rapid prototyping. Our team has been proud to contribute to the open source project in capacities ranging from occasional bug fixes to being core maintainers of several crucial components.
With Google’s recent open source release of TensorFlow, we initiated a project to test its suitability for our research environment. Over the last six months, we have re-implemented more than a dozen different projects in TensorFlow to develop a deeper understanding of its potential use cases and the tradeoffs for research. Today we are excited to announce that DeepMind will start using TensorFlow for all our future research. We believe that TensorFlow will enable us to execute our ambitious research goals at much larger scale and an even faster pace, providing us with a unique opportunity to further accelerate our research programme.
As one of the core contributors of Torch7, I have had the pleasure of working closely with an excellent community of developers and researchers, and it has been amazing to see all the great work that has been built on top of the platform and the impact this has had on the field. Torch7 is currently being used by Facebook, Twitter, and many start-ups and academic labs as well as DeepMind, and I’m proud of the significant contribution it has made to a large community in both research and industry. Our transition to TensorFlow represents a new chapter, and I feel very excited about the prospect of DeepMind contributing heavily to another great open source machine learning platform that everyone can use to advance the state-of-the-art.
- Computer Science Education for All Students
Posted by Maggie Johnson, Director of Education and University Relations
(Cross-posted on the Google for Education Blog)
Computer science education is a pathway to innovation, to creativity, and to exciting career prospects. No longer considered an optional skill, CS is quickly becoming a “new basic”, foundational for learning. In order for our students to be equipped for the world of tomorrow, we need to provide them with access to computer science education today.
At Google, we believe that all students deserve these opportunities. Today we join some of America’s leading companies, governors, and educators to support an open letter to Congress, asking for funding to provide every student in every school the opportunity to learn computer science. Google has long been committed to developing programs, resources, tools and community partnerships that make computer science engaging and accessible for all students.
We are strengthening that commitment today by announcing an additional investment of $10 million towards computer science education for 2017, along with the $23.5 million that we have allocated for 2016. This funding will allow us to build more resources, scale our programs, and provide additional support to our partners, with a goal of reaching an additional 5 million students.
With Congress’ help, we can ensure that every child has access to computer science education. Please join us by signing our online petition at www.change.org/computerscience.
- Helping webmasters re-secure their sites
Posted by Kurt Thomas and Yuan Niu, Spam & Abuse Research
Every week, over 10 million users encounter harmful websites that deliver malware and scams. Many of these sites are compromised personal blogs or small business pages that have fallen victim due to a weak password or outdated software. Safe Browsing and Google Search protect visitors from dangerous content by displaying browser warnings and labeling search results with ‘this site may harm your computer’. While this helps keep users safe in the moment, the compromised site remains a problem that needs to be fixed.
Unfortunately, many webmasters for compromised sites are unaware anything is amiss. Worse yet, even when they learn of an incident, they may lack the security expertise to take action and address the root cause of compromise. Quoting one webmaster from a survey we conducted, “our daily and weekly backups were both infected” and even after seeking the help of a specialist, after “lots of wasted hours/days” the webmaster abandoned all attempts to restore the site and instead refocused his efforts on “rebuilding the site from scratch”.
In order to find the best way to help webmasters clean-up from compromise, we recently teamed up with the University of California, Berkeley to explore how to quickly contact webmasters and expedite recovery while minimizing the distress involved. We’ve summarized our key lessons below. The full study, which you can read here, was recently presented at the International World Wide Web Conference.
When Google works directly with webmasters during critical moments like security breaches, we can help 75% of webmasters re-secure their content. The whole process takes a median of 3 days. This is a better experience for webmasters and their audience.
How many sites get compromised?
Over the last year Google detected nearly 800,000 compromised websites—roughly 16,500 new sites every week from around the globe. Visitors to these sites are exposed to low-quality scam content and malware via drive-by downloads. While browser and search warnings help protect visitors from harm, these warnings can at times feel punitive to webmasters who learn only after-the-fact that their site was compromised. To balance the safety of our users with the experience of webmasters, we set out to find the best approach to help webmasters recover from security breaches and ultimately reconnect websites with their audience.
|Number of freshly compromised sites Google detects every week.|
Finding the most effective ways to aid webmasters
Making security issues less painful for webmasters—and everyone
- Getting in touch with webmasters: One of the hardest steps on the road to recovery is first getting in contact with webmasters. We tried three notification channels: email, browser warnings, and search warnings. For webmasters who proactively registered their site with Search Console, we found that email communication led to 75% of webmasters re-securing their pages. When we didn’t know a webmaster’s email address, browser warnings and search warnings helped 54% and 43% of sites clean up respectively.
- Providing tips on cleaning up harmful content: Attackers rely on hidden files, easy-to-miss redirects, and remote inclusions to serve scams and malware. This makes clean-up increasingly tricky. When we emailed webmasters, we included tips and samples of exactly which pages contained harmful content. This, combined with expedited notification, helped webmasters clean up 62% faster compared to no tips—usually within 3 days.
- Making sure sites stay clean: Once a site is no longer serving harmful content, it’s important to make sure attackers don’t reassert control. We monitored recently cleaned websites and found 12% were compromised again in 30 days. This illustrates the challenge involved in identifying the root cause of a breach versus dealing with the side-effects.
We hope that webmasters never have to deal with a security incident. If you are a webmaster, there are some quick steps you can take to reduce your risk. We’ve made it easier to receive security notifications through Google Analytics as well as through Search Console. Make sure to register for both services. Also, we have laid out helpful tips for updating your site’s software and adding additional authentication that will make your site safer.
If you’re a hosting provider or building a service that needs to notify victims of compromise, understand that the entire process is distressing for users. Establish a reliable communication channel before a security incident occurs, make sure to provide victims with clear recovery steps, and promptly reply to inquiries so the process feels helpful, not punitive.
As we work to make the web a safer place, we think it’s critical to empower webmasters and users to make good security decisions. It’s easy for the security community to be pessimistic about incident response being ‘too complex’ for victims, but as our findings demonstrate, even just starting a dialogue can significantly expedite recovery.
- Announcing TensorFlow 0.8 – now with distributed computing support!
Posted by Derek Murray, Software Engineer
Google uses machine learning across a wide range of its products. In order to continually improve our models, it's crucial that the training process be as fast as possible. One way to do this is to run TensorFlow across hundreds of machines, which shortens the training process for some models from weeks to hours, and allows us to experiment with models of increasing size and sophistication. Ever since we released TensorFlow as an open-source project, distributed training support has been one of the most requested features. Now the wait is over.
Today, we're excited to release TensorFlow 0.8 with distributed computing support, including everything you need to train distributed models on your own infrastructure. Distributed TensorFlow is powered by the high-performance gRPC library, which supports training on hundreds of machines in parallel. It complements our recent announcement of Google Cloud Machine Learning, which enables you to train and serve your TensorFlow models using the power of the Google Cloud Platform.
To coincide with the TensorFlow 0.8 release, we have published a distributed trainer for the Inception image classification neural network in the TensorFlow models repository. Using the distributed trainer, we trained the Inception network to 78% accuracy in less than 65 hours using 100 GPUs. Even small clusters—or a couple of machines under your desk—can benefit from distributed TensorFlow, since adding more GPUs improves the overall throughput, and produces accurate results sooner.
The distributed trainer also enables you to scale out training using a cluster management system like Kubernetes. Furthermore, once you have trained your model, you can deploy to production and speed up inference using TensorFlow Serving on Kubernetes.
|TensorFlow can speed up Inception training by a factor of 56, using 100 GPUs.|
Beyond distributed Inception, the 0.8 release includes new libraries for defining your own distributed models. TensorFlow's distributed architecture permits a great deal of flexibility in defining your model, because every process in the cluster can perform general-purpose computation. Our previous system DistBelief (like many systems that have followed it) used special "parameter servers" to manage the shared model parameters, where the parameter servers had a simple read/write interface for fetching and updating shared parameters. In TensorFlow, all computation—including parameter management—is represented in the dataflow graph, and the system maps the graph onto heterogeneous devices (like multi-core CPUs, general-purpose GPUs, and mobile processors) in the available processes. To make TensorFlow easier to use, we have included Python libraries that make it easy to write a model that runs on a single process and scales to use multiple replicas for training.
This architecture makes it easier to scale a single-process job up to use a cluster, and also to experiment with novel architectures for distributed training. As an example, my colleagues have recently shown that synchronous SGD with backup workers, implemented in the TensorFlow graph, achieves improved time-to-accuracy for image model training.
The current version of distributed computing support in TensorFlow is just the start. We are continuing to research ways of improving the performance of distributed training—both through engineering and algorithmic improvements—and will share these improvements with the community on GitHub. However, getting to this point would not have been possible without help from the following people:
- TensorFlow training libraries - Jianmin Chen, Matthieu Devin, Sherry Moore and Sergio Guadarrama
- TensorFlow core - Zhifeng Chen, Manjunath Kudlur and Vijay Vasudevan
- Testing - Shanqing Cai
- Inception model architecture - Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Jonathon Shlens and Zbigniew Wojna
- Project management - Amy McDonald Sandjideh
- Engineering leadership - Jeff Dean and Rajat Monga
- All of Google’s CS Education Programs and Tools in One Place
Posted by Chris Stephenson, Head of Computer Science Education Programs
(Cross-posted on the Google for Education Blog)
Interest in computer science education is growing rapidly; even the President of the United States has spoken of the importance of giving every student an opportunity to learn computer science. Google has been a supportive partner in these efforts by developing high-quality learning programs, educational tools and resources to advance new approaches in computer science education. To make it easier for all students and educators to access this information, today we’re launching a CS EDU website that specifically outlines our initiatives in CS education.
The President’s call to action is grounded in economic realities coupled with a lack of access and ongoing system inequities. There is an increasing need for computer science skills in the workforce, with the Bureau of Labor Statistics estimating that there will be more than 1.3 million job openings in computer and mathematical occupations by 2022. The majority of these jobs will require at least a Bachelor’s degree in Computer Science or in Information Technology, yet the U.S. is only producing 16,000 CS undergraduates per year.
One of the reasons there are so few computer science graduates is that too few students have the opportunity to study computer science in high school. Google’s research shows that only 25% of U.S. schools currently offer CS with programming or coding, despite the fact that 91% of parents want their children to learn computer science. In addition, schools with higher percentages of students living in households below the poverty line are even less likely to offer rigorous computer science courses.
Increasing access to computer science for all learners requires tremendous commitment from a wide range of stakeholders, and we strive to be a strong supportive partner of these efforts. Our new CS EDU website shows all the ways Google is working to address the need for improved access to high quality computer science learning in formal and informal education. Some current programs you’ll find there include:
- CS First: providing more than 360,000 middle school students with an opportunity to create technology through free computer science clubs
- Exploring Computational Thinking: sharing more than 130 lesson plans aligned to international standards for students aged 8 to 18
- igniteCS: offering support and mentoring to address the retention problem in diverse student populations at the undergraduate level in more than 40 universities and counting
- Blockly and other programming tools powering Code.org’s Hour of Code (2 million users)
- Google’s Made with Code: movement that inspires millions of girls to learn to code and to see it as a means to pursue their dream careers (more than 10 million unique visitors)
- ...and many more!
Computer science education is a pathway to innovation, to creativity and to exciting career opportunities, and Google believes that all students deserve these opportunities. That is why we are committed to developing programs, resources, tools and community partnerships that make computer science engaging and accessible for all students. With the launch of our CS EDU website, all of these programs are at your fingertips.