Looking for more podcasts? Tune in to the Salesforce Developer podcast to hear short and insightful stories for developers, from developers.
62. Crowdsourcing Code Translation
Hosted by Julián Duque, with guest Parker Phinney.
Many online platforms rely on crowdsourced content, and Interview Cake is no different. Their goal is to teach users how to succeed in program interviews. To do so, they make their help resources available in over ten programming languages. They were able to accomplish this feat by hiring experts in each language to translate the content into various languages. Parker Phinney, creator of Interview Cake, shares with us how they managed this team and ensured that the work was correct.
Parker Phinney, creator of Interview Cake, continues his discussion from a previous episode with Julián Duque, a developer advocate at Heroku. Interview Cake provides different interview questions and programming exercise that adapt to the programming language that a candidate is working on. Since their coursework is essential to helping users succeed, they made an effort to ensure that the work was done accurately.
First, they provided their entire course curriculum in Python, the programming language they were most familiar with. Then, they hired a team of language experts to convert those Python lessons into various other languages. Finally, they hired a second team of experts to review the work of the first team. This two-pass system allowed them to bootstrap expertise in almost a dozen different languages.
Parker continues by discussing some of the challenges involved in this system, including keeping track of the work and managing the funds for various individuals. He also talks about various improvements they are making to the pipeline for their next iteration of content.
Julián: Welcome to Code[ish]. My name is Julián Duque. I'm a developer advocate at Heroku, and today, we are going to be continuing talking to Parker Phinney from Interview Cake about the Interview Cake platform and one of the challenges they had while translating the code content into multiple programming languages. So today, we have Parker Phinney from Interview Cake that is going to tell us everything about how they crowdsourced the whole translation team, and made it possible to have support of multiple programming languages in the Interview Cake platform. Hello, Parker. Welcome again.
Parker: Hey, thanks for having me back.
Julián: Yes. You were mentioning of our previous episode about the different interview questions and programming exercise that you had in your platform and that every question change it when you select a new programming language. Can you explain a little bit more to our listeners what we are talking about?
Parker: Yeah. Totally. Maybe I should step back first and just briefly, for people who didn't hear the previous episode, explain what Interview Cake is. So Interview Cake is a website that has an online course that prepares software engineers for coding interviews. In particular, we help people get better at solving tricky data structures and algorithms type coding interview questions.
Parker: But if you're not super comfortable with Java, it makes that content pretty hard to access for you, and so we had a unique opportunity with Interview Cake because it's a website. It's hypertext, so the document can change. So we set out to try to make our content available in all kinds of different programming languages. We support 10 different ones today, and as we started to kind of think about this project, we quickly realized that... a couple things.
Parker: The first is if we're going to add, say, PHP to our set of languages that we support, it's important that the PHP code is not just correct, right? Like it gives the correct answer, and runs without errors, and all that stuff. But also, it has to be well-formed, stylistically sound PHP. PHP is not maybe the most opinionated language, but I think they're... Maybe Objective-C is a more opinionated language. Maybe that's a better example.
Parker: We're also kind of rubbing up against the other thing that we realized, which is if we're going to be writing this stuff, as I said, it needs to be not just correct, but good and well-styled. On our team, we didn't have experts in all these languages, so we have this tricky problem of how do we find people who are real experts in each of these languages and can make good like educational code samples because it's one thing for a person to say they're an expert, but you want to interview them and confirm that. If you're not an expert yourself, how do you evaluate?
Parker: This problem of bootstrapping expertise was really kind of a sticking point for us, so that's where we got really clever. So what we did was we hired six different engineers on Upwork to do a test task, and the test task was to, for any given language, say PHP, we'll repeat this process for each language. So for PHP, for example, we hire six different people to translate one code sample from Python to PHP, and then this work gets interesting. We hired six different PHP engineers to rate those six translations, and we explained to both cohorts that what we were really looking for was well-styled PHP code that's not just correct, but very readable and adheres to PHP standards or I think in the case of PHP, a specific set of PHP standards and stuff like that.
Parker: So then, we take the six translations and the six reviews, and whoever got the best reviews, we hire that person to be our translator. But then, we also look at the six reviewers, and whoever was the most thorough reviewer, had the most notes, and seemed to find the most tiny details, we also hire them to be the reviewer of the translators, translations as they go through our content and translate it into the new language, in this case PHP. So in this way, we have two sets of eyes on everything. That's how we bootstrapped expertise in PHP and almost a dozen other different languages.
Julián: Yeah. What was the first programming language you supported or had content in your platform?
Parker: A good question. It was Python, and that's mostly because Python was my most comfortable language, but this is... It's a good question because early on, there is this struggle where I... The code was Python, but I thought... because I had an eye towards this thing of like I want this to be accessible to everybody, so I was like, "Well, why don't I do this? Why don't I say it's pseudocode, but it happens to be runnable Python?" So if you know Python, it's going to be correct and you can run it.
Parker: So I would do these things where if there was something that was sort of Python, a specific Python idiom that would be a little opaque if you weren't a Python person, I wouldn't use that idiom to try to make it as readable to everyone as possible. So the upshot was for people who didn't know Python, the content was a little more accessible, but to people who did know Python, they were a little confused. They were like, "Why are you doing this? Why are you writing like C code in Python? You should be using a list comprehension here," or whatever the kind of idiom was that would be more Pythonic, and so we realized that we really couldn't have it both ways and we had to support each language individually.
Parker: Right. Totally. Yeah.
Julián: So that's pretty much the approach you follow here. To be able not to translate yours is the problem, but also, the code to look like a semantic use of that programming language?
Parker: That's right. Yeah. Yeah, and you bring up a great point. Sometimes the code is going to look very different in these different languages, and so take list comprehensions, for example. It's a tool in Python. I believe Ruby has it as well. I'm sure a few different more scripting languages do, and it's very powerful, and it allows you to compress what would in Java or C be a loop into just one line. So it's idiomatic and it's powerful, but it also... If you have that one line list comprehension in Python, there's a specific kind of pedagogical moment there where it's important to teach, especially if the candidate is a little bit new to thinking about data structures and algorithms.
Parker: It's important to explain to them that under the hood, there is a loop here. Right? So this one-liner. You don't see a loop, but there's O(n) time or O(n) work that's happening here, and that's a little bit of education that you have to do when the language is set to Python that you don't have to do when the language is set to Java because you can see the work right there in Java.
Parker: So this leads me to something that is a little bit... was a big kind of monkey wrench when we went to do this translation project, which is that not only does the code have to change, but the educational content around the code, in a lot of cases, also has to change when you move from Python to Java. So not only did we translate the code samples for each write-up and each question, but we also go through the whole write-up, and we add or remove whole sentences and whole paragraphs so that we're teaching as closely as possible to that specific language.
Julián: Like that filter, it's going to be doing a loop and doing like certain operations?
Parker: Yeah. Well, we do. We do have some people. This isn't something that we necessarily encourage, but it can be interesting. We have some people who say that they do each question in two different languages because they're trying to get stronger at one language or they're interested to see the differences between the two. That's certainly something that you can do.
Parker: That example of list comprehension I think is a good one. Again, if you're writing code in Java or C, it tends to be pretty explicit, and so you can really kind of see what's going on under the hood, what the... kind of exactly what the time and space costs are in your code. Whereas if you're using one of these higher level more scripty languages, a lot can be buried in a one-liner.
Parker: So we do spend a lot of time in those languages, specifically teaching people how to understand that in that one-liner, there's this underlying stuff, so we do, and we do sometimes actually go through the process of saying like, "Okay, this. This one-liner, you could rewrite it as this loop." So now, you can use the one-liner when you're writing code, but in your head, there should always be like a part of your mind that sees that loop under the hood so that you're really getting a full picture of what the efficiency and the time and space costs are of your code.
Julián: Ah, interesting. So you were telling me that you started with six people that were doing the translation?
Julián: And other six people that started rating the translation?
Julián: What happened next? How you started to scale up this process to support 10 languages?
Parker: Yeah, so we would do that process. It wasn't always six and six, but something like that. We would do that process for each language, and after... This was a very brilliant former member of the team. His name is Noah. He ran this process a couple of times himself, and he would kind of make up a spreadsheet, and he really kind of got the process dialed in. Once we got... There's a whole... One of the, in my opinion, kind of most fun steps was once we had the six and six, the six translations and the six reviews, we would print them all out and sit down in a conference room, and mark them up, and figure out which person we wanted to hire from, from each cohort.
Parker: So once that process was really dialed in, Noah had our virtual assistant manage the process. So she was able to set up the job listings, and open the contracts with all the candidates, and deliver them the task. Basically, the deliverable then for Noah and I was just the PDF of all of the code samples, and we were able to kind of take it from there.
Parker: So once we have our two people for each language, we had a whole system where we set them up on Asana, and we would give them their first thing to translate so that they got comfortable with our stack, and they could get the development server running, and all that stuff. Then, we gave them their next like five things to translate, and then we would keep a close eye on the first ones to make sure that they were paying attention to the details, and following our conventions, and stuff like that. Then, eventually, we would just kind of let them loose and say, "Okay. Now, go. Go translate all the rest of the stuff."
Julián: Nice, and what other tools did you use to achieve this coordination of work?
Parker: It was tricky, and Noah was the one who was really running this process. He handed it off to me when he moved on to his next opportunity, and I have to admit. I didn't do as good of a job keeping things organized because it was a lot of people and a lot of details, but it was a mix of Asana, spreadsheets, and GitHub bugs, and GitHub tasks, and also, the GitHub Wiki, we used a lot for... early on for a new language, really cementing what the conventions would be.
Julián: How you test the solutions? How are you making sure those are going to be valid?
Parker: Yeah. Yeah, it's a good question. Yeah. There's a lot of tech behind the hood that makes all this stuff work. Yeah, lots to talk about. So for that specific question, each code sample under the hood right next to the code sample in each of the 10 languages, you can put a test for that code sample. So if it's a full solution to a problem, you can write a bunch of test cases that a solution will run through.
Parker: If it's something that's more like just a... indexing into an array and setting a variable to something just like a little two-line thing, we'll still actually run that code and confirm that it runs, and in the case of like Java, like compiles without error. So yeah, we have a lot of tech that allows us to actually run each of our code samples and confirm that they're outputting the right answer.
Julián: Nice, and how do you maintain right now the current content you have? Do you need to go through the same process of finding people out there, or you already have like established team and experts in every language to take care of the maintenance?
Parker: Yeah. This is something that's actually been a bit tricky for us. This process was very expensive, much more expensive than we thought it would be, and we did it kind of all at once. We went through and translated the content into as many languages as we could, and then each of those teams kind of finished translating all the content. So we closed the contracts because we didn't need help with translation anymore.
Parker: For a while, we were focused on a few things other than content. We are building some new features and stuff like that. More recently, we've picked up some more content stuff, and one of the challenges is going to be re-translating that content because I'm sure some people that we've worked with that have done great work, we'll be able to work with again. I'm sure some others are not going to have bandwidth anymore because they're busy with other projects or maybe not be on the platform anymore so... The Upwork platform that is.
Parker: We did also find that there were a couple of people who after they finished their work, this is also part of why we closed the contracts, there are a couple people who were sort of bad actors who continued to bill even though there really wasn't any work left to be done. That was also sort of on us for kind of not keeping a closer eye on what was going on.
Parker: The bottom line for us is that one thing that we've learned is this process we think so far is kind of best done as like spin up, get all the work done, spin down, and so now we have some new content that's kind of in the pipeline, and we're waiting to kind of finish that, and then we'll go through, spin up the translation team, do all the translations, and then spin that team down and release the content.
Parker: It's added quite a bit of friction, but that's sort of the process that we've settled into. This is something that we talk about a lot at Interview Cake because we could have just done... We could have just written pseudocode. Right? We could have said, "Yeah. This is the easiest way to make our content accessible to as many people as possible is we'll just..." It's not real code. It's just pseudocode. Right?
Parker: We could have said like, "Well, it's Python." Yeah, like not everyone is going to fully understand it, but it's like pretty readable, so it's good enough. Right? But for us, it's always worth spending the extra time and money to make our stuff even just a little bit more accessible to people because that's our focus is to be the most accessible, most clear platform in our space. So to us, it's worth it to spend the time and money.
Julián: So if you... You say that you need to be doing this again from the first process and the first time you did this translation, what you will do different this time? What you will change, or what will you improve?
Parker: So I think the main thing is we looked back at how often the reviewer suggested changes for different languages, and there were a couple where the reviewer wouldn't often find anything, and so we started to experiment a little bit towards the end with not having that second pair of eyes. Then, there were some other cases where the reviewer was suggesting a lot of edits, and so we experimented with just making them the translator instead.
Parker: So that's something that we might experiment with in the future is having fewer people per language, especially in cases where we're able to work with someone we've worked with in the past who has shown like a lot of attention to detail. We might be able to kind of trust them to be the one set of eyes on a given language.
Julián: Did any of the people you hired temporarily to do the translation and reviewing work stay at your team? Did you hire them full-time or they just remained temporarily for that specific work?
Parker: Yeah, everyone stayed just on a contract. This was all through Upwork, so they were people from all over the world, and actually, Upwork I think does not allow you to like start working directly with people or they have an embargo for a year or something.
Julián: Okay, got it.
Parker: But it was really cool to be working with people from all over the world. We actually used to have a big map on the wall with pins with like where each person was located, and what their name was, and stuff. It's pretty cool to know that your website is being built by people from all over.
Julián: Nice, so you were like managing also different time zones?
Parker: Yeah. The work was not particularly synchronous, so the time zones didn't tend to matter all that much. Although, Noah would get on a call with just about everybody, so like scheduling that, you have to be conscious of time zones.
Julián: What was the hardest part of the whole project?
Parker: I think the hardest part was just the management overhead, doing all the languages at once. We didn't quite do all of them at once, but at one point, we had ramped up to like... We had a small team for I think like six or seven different languages at the same time. Maybe more, and that's just a lot of people for one person to manage, so that was the biggest thing, and there are so many parts to that management.
Parker: I mean, there's seeing progress. Right? Like knowing where each team is, where they're stuck, if they're still moving forward. Also, keeping track of how people are billing, if they're still working, if they're over-billing, which unfortunately was the case in just a couple of cases. There are a lot of things to kind of keep an eye on.
Parker: One thing I would do differently is build a little bit more internal tooling to give us better visibility into what work has been done recently, what work still needs to be done next, what the timeline is of what's happened so that we could see which teams were cruising and which ones were getting stuck.
Julián: Any advice to a company that want to crowdsource expertise in different platforms? What they need to be taking care of when hiring external people to their teams? How to make sure the project is going to be successful?
Parker: Yeah. We're big fans of this process of hiring several people to do a first task, and then opening an ongoing contract with the person who does the best work. I've used this in my personal life for hiring a piano teacher, for hiring an Arabic teacher. Even for finding a therapist, I did like five different first consultations, first sessions, and of course, you always pay for their time, but it's remarkable how much... Every time I run this kind of process, there tends to be one person who is just far and away a better fit for what I'm looking for.
Parker: It's so much better to be working with that person right away rather than kind of starting with the first person you talked to, and then taking weeks, and ending up in this place where you're like, "Uh, they're... It's kind of okay. I don't know. It's like maybe someone would be a better fit, but I don't know if I want to look." Right? Running this process from the beginning of working with a bunch of people and seeing who really works best. I can't say enough about it.
Julián: It may be a little bit more expensive, but at least you know you have the right person to do the job.
Julián: That's nice. Okay. This is very interesting, what you are building and the approach you follow here. Wishing you the best for the next iteration of the content translation.
Parker: Thank you.
Julián: It seems that you have nailed down the process that works for you, and I hope the listeners out there that are planning like also to bring expertise from other parts externally to their companies to work on projects can learn something by following the process you did. I think it's a very good recommendation to be able to test different person and approaches to see what is going to be the best fit for the specific project you are working on.
Julián: Any other recommendation to the listeners before saying the goodbyes?
Parker: Yeah. Yeah. I mean, I think exactly as you said. Wherever possible, see if you can work with a few people and see who's the best fit. The thing that was unique about this project is we weren't able to evaluate ourselves who was actually a good fit because we don't know what great PHP code looks like, so we were also able to hire people to make that evaluation. So it's some tricky second order stuff, and again, it involves paying more people just to find who's the right fit. But if you can afford it and if you can figure out a clever way to make it happen... It worked for us, so.
Julián: Okay. Parker, thank you very much for telling us about this project. I think it's a huge challenge and the approach you follow was very interesting. I hope like all of our listeners are going to learn how to also crowdsource expertise out there.
Julián: This was Code[ish]. My name is Julián Duque, and we had Parker Phinney from Interview Cake with us today. Thank you, Parker, for all your contributions and for your time.
Parker: Thanks, Julián.
Julián: Hear us on the next episode of Code[ish]. Thank you very much. Bye-bye.
A podcast brought to you by the developer advocate team at Heroku, exploring code, technology, tools, tips, and the life of the developer.
More episodes from Code[ish]
Jim Jagielski and Alyssa Arvin
Jim Jagielski is the newest member of Salesforce’s Open Source Program Office, but he’s no newbie to open source. In this episode, he talks with Alyssa Arvin, Senior Program Manager for Open Source about his early explorations into open... →
Lisa Marshall and Greg Nokes
This episode of Codeish includes Greg Nokes, distinguished technical architect with Salesforce Heroku, and Lisa Marshall, Senior Vice President of TMP Innovation & Learning at Salesforce. Lisa manages a team within technology and product... →
Innocent Bindura and Greg Nokes
How do you know an application is performing well beyond the absence of crash reports? Innocent Bindura, a senior developer at Raygun, shares the company's tools and utilities, discusses the importance of monitoring P99 latency, and talks... →