opensource.google.com

Menu

JanusGraph connects the past and future of Titan

Thursday, January 12, 2017

We are thrilled to collaborate with a group of individuals and companies, including Expero, GRAKN.AI, Hortonworks and IBM, in launching a new project — JanusGraph — under The Linux Foundation to advance the state-of-the-art in distributed graph computation.

JanusGraph is a fork of the popular open source project Titan, originally released in 2012 by Aurelius, and subsequently acquired by DataStax. Titan has been widely adopted for large-scale distributed graph computation and many users have contributed to its ongoing development, which has slowed down as of late: there have been no Titan releases since the 1.0 release in September 2015, and the repository has seen no updates since June 2016.

This new project will reinvigorate development of the distributed graph system to add new functionality, improve performance and scalability, and maintain a variety of storage backends.

The name "Janus" comes from the name of a Roman god who looks simultaneously into the past to the Titans (divine beings from Greek mythology) as well as into the future.

All are welcome to participate in the JanusGraph project, whether by contributing features or bug fixes, filing feature requests and bugs, improving the documentation or helping shape the product roadmap through feature requests and use cases.

Get involved by taking a look at our website and browse the code on GitHub.

We look forward to hearing from you!

By Misha Brukman, Google Cloud Platform

Apache Beam graduates to a top-level project

Tuesday, January 10, 2017

Please join me in extending a hearty digital “Huzzah!” to the Apache Beam community: as announced today, Apache Beam is an official graduate of the Apache Incubator and is now a full-fledged, top-level Apache project. This achievement is a direct reflection of the hard work the community has invested in transforming Beam into an open, professional and community-driven project.

11 months ago, Google and a number of partners donated a giant pile of code to the Apache Software Foundation, thus forming the incubating Beam project. The bulk of this code composed the Google Cloud Dataflow SDK: the libraries that developers used to write streaming and batch pipelines that ran on any supported execution engine. At the time, the main supported engine was Google’s Cloud Dataflow service with support for Apache Spark and Apache Flink in development); as of today there are five officially supported runners. Though there were many motivations behind the creation of Apache Beam, the one at the heart of everything was a desire to build an open and thriving community and ecosystem around this powerful model for data processing that so many of us at Google spent years refining. But taking a project with over a decade of engineering momentum behind it from within a single company and opening it to the world is no small feat. That’s why I feel today’s announcement is so meaningful.

With that context in mind, let’s look at some statistics squirreled away in the graduation maturity model assessment:

  • Out of the ~22 large modules in the codebase, at least 10 modules have been developed from scratch by the community, with little to no contribution from Google.
  • Since September, no single organization has had more than ~50% of the unique contributors per month.
  • The majority of new committers added during incubation came from outside Google.

And for good measure, here’s a quote from the Vice President of the Apache Incubator, lifted from the public Apache incubator general discussions list where Beam’s graduation was first proposed:

“In my day job as well as part of my work at Apache, I have been very impressed at the way that Google really understands how to work with open source communities like Apache. The Apache Beam project is a great example of this and is a great example of how to build a community." -- Ted Dunning, Vice President of Apache Incubator

The point I’m trying to make here is this: while Google’s commitment to Apache Beam remains as strong as it always has been, everyone involved (both within Google and without) has done an excellent job of building an open source project that’s truly open in the best sense of the word.

This is what makes open source software amazing: people coming together to build great, practical systems for everyone to use because the work is exciting, useful and relevant. This is the core reason I was so excited about us creating Apache Beam in the first place, the reason I’m proud to have played some small part in that journey, and the reason I’m so grateful for all the work the community has invested in making the project a reality.

Naturally, graduation is only one milestone in the lifetime of the project, and we have many more ahead of us, but becoming top-level project is an indication that Apache Beam now has a development community that is ready for prime time.

That means we’re ready to continue pushing forward the state of the art in stream and batch processing. We’re ready to bring the promise of portability to programmatic data processing, much in the way SQL has done so for declarative data analysis. We’re ready to build the things that never would have gotten built had this project stayed confined within the walls of Google. And last but perhaps not least, we’re ready to recoup the vast quantities of text space previously consumed by the mandatory “(incubating)” moniker accompanying all of our initial mentions of Apache Beam!

But seriously, whatever your motivation, please consider joining us along the way. We have an exciting road ahead.

By Tyler Akidau, Apache Beam PMC and Staff Software Engineer at Google

Google Summer of Code 2016 wrap-up: Oppia

Friday, January 6, 2017

Google Summer of Code (GSoC) is an annual program that encourages university students to become open source contributors. This guest post is part of a series of blog posts from the open source projects and organizations that participated in GSoC 2016.

The Oppia project makes it easy for anyone to create lightweight, interactive online lessons that simulate personal tutoring. These activities, called “explorations,” can be shared with others around the world as standalone tutorials (such as Programming with Carla and Quadratic Equations), or embedded in websites to supplement an existing course (such as “Take Your Medicine” on edX and Computational Thinking for Educators).

2016 was Oppia’s first year participating in GSoC and it was a blast! More students flocked to our ideas page than we had expected, and our Gitter channel was full of people saying hello and looking for starter projects. Over the course of the summer, with the help of two capable and enthusiastic students, we were able to bring the following new features to the Oppia codebase:

A new creator dashboard -- Avijit Gupta


An important principle of Oppia is that lessons can be easily improved over time -- it’s hard to figure out all the possible ways a student can go wrong at the outset, but it’s much easier to respond appropriately to a new misconception that arises.

Each creator on Oppia has a “creator dashboard” which allows them to see the lessons they’ve created, as well as the feedback they’ve received from learners. Avijit completed a full revamp of this page, updating its design (for both desktop and mobile) and finding ways to display all the necessary information in an intuitive way so that creators can easily improve their lessons while getting feedback on their teaching.

The new creator dashboard.

In addition, Avijit added functionality allowing creators to view student misconceptions that were not well-addressed, to make it easier for them to improve the feedback for those answers. He has continued to help out with the Oppia open source project as a maintainer and reviewer, even after GSoC, and is mentoring other contributors who are working on further improvements to the creator dashboard. You can read more about the project in his GSoC writeup!

Speed improvements -- Vishal Gupta


In order to improve the accessibility of lessons for students with poor internet connectivity, Vishal’s project aimed to make Oppia speedier and less bandwidth-intensive. He started by implementing a performance testing framework to benchmark his efforts, and also integrated it with our continuous integration system in order to protect against performance regressions. He then turned his efforts to caching as many static resources as possible, implementing a cache slug system that causes new files to be downloaded only after a new release is made.

In addition, Vishal removed JavaScript code that was inlined in the main templates, and refactored it out into an external script which could then be cached for better performance. You can read more about this project in his post on the Oppia blog.

We’d like to extend our grateful thanks not only to Avijit and Vishal, but also to our many willing and enthusiastic mentors, and to Google for supporting our open source work with GSoC.

Join us in helping improve educational opportunities for students around the world. If you’d like to subscribe to news and updates about Oppia’s participation in GSoC, you can sign up to the oppia-gsoc-announce mailing list -- or, if you’re already feeling enthusiastic, you can start helping out with the project right away!

By Ben Henning and Sean Lip, Organization Administrators for Oppia

Grumpy: Go running Python!

Wednesday, January 4, 2017

Google runs millions of lines of Python code. The front-end server that drives youtube.com and YouTube’s APIs is primarily written in Python, and it serves millions of requests per second! YouTube’s front-end runs on CPython 2.7, so we’ve put a ton of work into improving the runtime and adapting our application to work optimally within it. These efforts have borne a lot of fruit over the years, but we always run up against the same issue: it's very difficult to make concurrent workloads perform well on CPython.

To solve this problem, we investigated a number of other Python runtimes. Each had trade-offs and none solved the concurrency problem without introducing other issues.

So we asked ourselves a crazy question: What if we were to implement an alternative runtime optimized for real-time serving? Once we started going down the rabbit hole, Go seemed like an obvious choice of platform since its operational characteristics align well with our use case (e.g. lightweight threads). We wanted first class language interoperability and Go’s powerful runtime type reflection system made this straightforward. Python in Go felt very natural, and so Grumpy was born.

Grumpy is an experimental Python runtime for Go. It translates Python code into Go programs, and those transpiled programs run seamlessly within the Go runtime. We needed to support a large existing Python codebase, so it was important to have a high degree of compatibility with CPython (quirks and all). The goal is for Grumpy to be a drop-in replacement runtime for any pure-Python project.

Two design choices we made had big consequences. First, we decided to forgo support for C extension modules. This means that Grumpy cannot leverage the wealth of existing Python C extensions but it gave us a lot of flexibility to design an API and object representation that scales for parallel workloads. In particular, Grumpy has no global interpreter lock, and it leverages Go’s garbage collection for object lifetime management instead of counting references. We think Grumpy has the potential to scale more gracefully than CPython for many real world workloads. Results from Grumpy’s synthetic Fibonacci benchmark demonstrate some of this potential:



Second, Grumpy is not an interpreter. Grumpy programs are compiled and linked just like any other Go program. The downside is less development and deployment flexibility, but it offers several advantages. For one, it creates optimization opportunities at compile time via static program analysis. But the biggest advantage is that interoperability with Go code becomes very powerful and straightforward: Grumpy programs can import Go packages just like Python modules! For example, the Python snippet below uses Go’s standard net/http package to start a simple server:

from __go__.net.http import ListenAndServe, RedirectHandler

handler = RedirectHandler('http://github.com/google/grumpy', 303)
ListenAndServe('127.0.0.1:8080', handler)

We’re excited about the prospects for Grumpy. Although it’s still alpha software, most of the language constructs and many core built-in types work like you’d expect. There are still holes to fill — many built-in types are missing methods and attributes, built-in functions are absent and the standard library is virtually empty. If you find things that you wish were working, file an issue so we know what to prioritize. Or better yet, submit a pull request.

Stay Grumpy!

By Dylan Trotter, YouTube Engineering

Rails Girls Summer of Code: Changing the face of tech

Tuesday, January 3, 2017

This is a guest post from Laura Gaetano who organizes Rails Girls Summer of Code, a global fellowship program inspired by Google Summer of Code.

Have you seen that picture of Margaret Hamilton, the NASA engineer who worked on the computer systems for the Apollo 11 launch? She’s standing next to the human-sized pile of listings of the Apollo Guidance Computer source code that she worked on. Do you know about Ada Lovelace, often cited as the very first computer programmer?

From World War II until the 1980s, women engineers and women computer operators were fairly common. There was a steady rise in women entering STEM fields, and young girls had role models and strong women to look up to. We're well acquainted with the drop in female engineering graduates worldwide after this time period, and the subsequent drop in the percentage of women entering the world of tech. We're here to help change that, and reverse the trend.

Rails Girls Summer of Code (RGSoC) aims to bring more diversity into the world of tech — specifically, into the world of open source software, where women make up a mere 11% of the community. The global program offers 3-month scholarships to teams of women to allow them to work full-time on an open source project of their choice – aided by local coaches and guided by the project maintainer (or a core contributor). The scholarships are funded through the support of the community as well as our sponsors, via a crowdfunding campaign.

Local vs. Global
We all cherish our local community and understand how strong of a support network it can be, especially for newcomers. The Rails Girls chapters worldwide emphasize that need: most coaches and organisers are local, and many alums go on to create their own study groups, or become coaches or organisers themselves. RGSoC also relies strongly on a global network of user groups — both Rails Girls chapters and similar organisations such as PyLadies or DjangoGirls.

Thanks to our connections with these different groups, we are able to reach people in remote or unlikely locations, and build the most diverse group of applicants possible. This is very important to us. Since the beginning, the program has provided the opportunity to bring together women with different experiences, backgrounds, locales and age groups to come together and be part of the same global initiative.

Our Structure
Last year, we received over 90 team applications. When applying, each two-person team chooses from a list of pre-selected projects. These projects are maintained by people we either personally know, or who have reached out to us prior to the application period. We look for projects with patient, open-minded contributors who are active in their community, and projects that provide a lot of learning opportunities for applicants.

Project maintainers (also called mentors) are in touch with students in order to adapt the roadmap throughout the summer to the students' needs and check up on their progress. On a daily basis, students spend the majority of their time with coaches. The coaches help, support, and teach the students throughout the summer. Each team is also appointed a supervisor, who supports students on the organisational side of things. They are the glue that keeps the whole team together, and a way for the core RGSoC team to keep track of how every team is doing.

Our Stats
Our program started in 2013 with 18 teams, 10 of which were sponsored and 8 of which were volunteer teams. The following year, 16 teams participated with 10 sponsored spots. The real breakthrough came in 2015 when we were able to fund 16 sponsored teams, a substantial increase from the previous years. Not only did this enable us to have more impact — with a potential 12 more women entering the tech world and STEM workforce than the previous years — but it also shows the community’s trust in the program.

In 2016, the Ruby community awarded us with a Ruby Hero Award, and we managed to collect enough money to sponsor 16 teams from five continents with another 4 teams joining as volunteers. This year was also the first time we had teams based in Uganda, Egypt, Singapore and the Czech Republic.
Our stats from 2016 (Image: Laura Gaetano/RGSoC)
In 2015, we contacted our alums from 2013 and 2014 to find out what they were doing after the program. The responses were impressive: out of 64 graduates, over 90% are now currently working in the tech field. A fair number of graduates have even founded their own startup. Not only that some of these women have found their calling, but we might have made a small difference in the community of open source, and are on the right track to really shake things up.

Where do we go from here
On the first of July last year, we kicked off our program with over 130 people participating — including coaches, supervisors, designers, helpdesk coaches and project mentors. We were incredibly excited to have 20 teams in 16 cities and 11 different countries, spanning time zones, from UTC+10 to UTC-7.
Our 2016 sponsored and volunteer teams! (Image: Ana Sofia Pinho/RGSoC)
We’ve seen in the past just how much of an impact we’ve had in our participants’ lives, and are hoping that this trend will continue to rise. We hope that some of our previous editions' teams graduated with the skills and confidence to become NASA engineers, web developers, or anything else they want to be. Hopefully someday they will become a young woman’s role model, and realise the important role they served in changing the future of engineering and of open source software.

By Laura Gaetano, Organizer of Rails Girls Summer of Code
.