Code Writing Code: Building a Python to JavaScript Transpiler

This post was written while I was a software engineer at Pinterest in 2016.

You may have seen our recent blog posts on switching out our template rendering engine and moving our web codebase to React. These projects have been part of a larger effort to have all web code at Pinterest written and run in a single language (JavaScript) on the server and client. This meant converting some server code from Python to JavaScript.

Migrating or rewriting code is a tedious process. It's not glamorous or sexy, no one wants to do it, and can often introduce new bugs to the codebase. It's hard to justify spending time doing so, if the work must be done by hand. If an engineer was stuck with the job of rewriting code into a different language, they might get bored, complain, or quit. You know who does not get bored, complain, or quit? A computer program. A computer program is happy to perform such tasks and does so with minimal complaint. This kind of language-translating program is called a transpiler. Instead of rewriting code myself I decided to build a Python-to-JavaScript transpiler.

A transpiler is a type of compiler that takes code of one programming language and produces equivalent code in another programming language. Traditionally, compilers have translated code from high-level programming languages to lower-level ones, but transpilers are used to generate code between languages with similar levels of abstraction. Recently, transpilers have gained popularity in the JavaScript community, with languages like TypeScript and CoffeeScript being transpiled to JavaScript, and ES6 JavaScript being transpiled down to ES5 in the popular Babel transpiler.

The area of the codebase I targeted with the transpiler is called the resource layer. At Pinterest, the resource layer is a lightweight server-side wrapper around API calls that may modify or reformat the data before sending the data to the client. Resource files generally all have the same structure and not much complexity. Thus, they were perfect for translating into JavaScript via transpiler.

During the transition, we wanted to rewrite parts of our framework without stopping product developers from continuing to build product. Having the transpiler was helpful because it meant that the migration process, instead of writing new resource code in both languages, developers could make changes to the old Python file and could run the transpiler over the changed file to have the changes reflected in a new JavaScript file. Then, when we were ready to switch over, we could just delete the old Python files and have developers work in the transpiled file.

So onto the good stuff: how does a transpiler work and how did we build one? In summary, the transpiler worked like this:

  1. Input: name of Python file. Read Python file into a string or buffer data structure, and run the string through an AST parser. For ease, I represented this AST in JSON using ast2json.
  2. Read in JSON into the transpiler. I chose to write the actual transpiler implementation in JavaScript, since it seemed like a further insult to Python to use Python to write JavaScript.
  3. The transpiler goes through each block of code as represented as an Abstract Syntax Tree in JSON. Each line of code can be represented as its own subtree.
  4. The transpiler is a program that looks at the different node types, and based on the Node type, returns the JavaScript syntax that would adequately represent that node in JavaScript. Most nodes contain other nodes, for example, a function Node might contain parameter nodes, or an If-else-if-else statement node would contain two if clauses that must be translated into some kind of BooleanLogic node, and the boolean logic node might have another function node in them for evaluation (See examples below).
  5. Further processing is done on the generated code, for example, running js-beautify to properly indent and space the code, or a code-mod to fix eslint errors.
  6. The string of generated code is then written to an output JavaScript file.

Congrats! Forget being a 10x coder, you are now a 1000x coder. You can now generate thousands of lines of new code every minute.

Abstract Syntax Trees

The hardest part about building the transpiler is understanding the Abstract Syntax Tree, which represents the structure of the code itself. From Wikipedia:
In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code.

Different features of a language can be represented by different types of Nodes. Here's an example.

In the first line, we have a IfStatement Node, and within the IfStatement we have a function (or CallExpression). The CallExpression (purple) is a function with a single argument: A Literal (`foo`). Thus, on the right, we see this represented in a tree structure, where the IfStatement has a Test node that is represented by our CallExpression node, and our CallExpression has both an identifier (the function name, purple) and an array of arguments (in this case, just one- a Literal with the value `foo`).

Here it is represented in a more tree-like structure. Note that the IfStatement has both a Test node and a Consequent node (bar = baz || boo;). Expanding the Consequent Node we get:



Where the line

bar = baz || boo;

Can be represented by an AssignmentExpression node (assigning the Identifier bar) with an equals operator, to the right-hand side of the equals: a LogicalExpression with the operator || that also has a left and right.

Once you are able to understand all lines of code and all programs as an abstract syntax tree, creating a transpiler is quite straightforward. You just need to write code that takes in every possible type of node and outputs code in the new language based on how the new language represents that structure. So for example, in python, a LogicalExpression might look like

baz or boo

But in JavaScript, it looks like

baz || boo

So, our code for the LogicalExpression node would look something like:

function processNode(node) {
    switch (node.type):

    case `LogicalExpression`: {
         if (operator === '||') {
             // we know there's a left and a right node in a || expression.
             return processNode(node.left) + ' || ' + processNode(node.right);
         }
         // Other types of operators go here ...
    }
    case 'Identifier': return node.name;
    case 'AssignmentExpression': return 'var ' + processNode(node.left) + ' = ' + processNode(node.right);
    // other types of Nodes go here...
}


Repeat for all different node types. This is the fun part - figuring out how to translate from Python to JavaScript! This meant getting to know JavaScript/Python syntax intimately, as well as a lot of comparing standard libraries in JS/Python to see how they differed. I had to deal with the language quirks of JavaScript: falsey, none, null, undefined, as well as the differences in array iteration, variable scoping, and many other gotchas.

Benefits

Unlike human-written code, the transpiler code was very consistent in the different types of bugs that would appear. This made it a lot easier to debug if anything was going wrong, because there were certain types of bugs that I knew to look out for. Also, if there was an issue in the transpiler generated code, I could fix the transpiler and then re-run it on all files to fix the bug in all files. It turns out that code written by computers is much less buggy than code written by humans. Who knew?

Other Considerations

Computers can write code now? Am I eventually automating myself out of a job?
Probably, but I prefer to automate things rather than do them myself. I was also able to test the code using automation - specifically, shadow traffic, which I won't talk about here because it deserves a whole other blog post. [added later] I did speak about both in my talk at ScaleConf NZ 2017.

Will I open source this?
I won't be open-sourcing at this time because it's highly specific to Pinterest Python code + Libraries.