8. Sharing Data with Dataclips
Hosted by Chris Castle, with guest Becky Jaimes.
Heroku Dataclips has been around since 2012, and it's still a reliable part of the Heroku Postgres ecosystem. Dataclips lets you quickly, easily, and safely access your database, allowing you to share the results with others to see. Dataclips makes it easier for anyone in your organization to get the data that they need, without connecting to a production server and memorizing the correct SQL incantation.
Recently, Dataclips was updated with newer features and functionality. Join us as we talk about what's changed and what's reliably staying the same.
Chris Castle introduces Becky Jaimes, the product manager for Heroku Dataclips. Dataclips is a feature that allows you to save and version SQL queries against a Postgres database. You can export these queries to JSON or CSV, share them online, or simply access them through an HTTP API.
The two discuss the history of Dataclips, namely what its original needs were and how it has evolved. It was originally targeted towards individuals who simply wanted a quick way to get access to rows from a database, and now it's grown to be a useful source of information for BizOps teams, engineers, and product managers.
The newest release of Dataclips came out of a very long beta program involving customers of various sizes. Some customers had a few dozen queries they accessed frequently, while others had thousands of them. The team at Heroku wanted to create a one-size-fits-all solution, while at the same time reducing technical debt which had been a hinderance to shipping quickly. Becky shares some of the features that are now available.
Links from this episode
Chris Castle: Welcome back to Code[ish]. This is developer advocate Chris Castle. Today, I've got with me here, Becky. Becky, can you introduce yourself?
Becky Jaimes: Hi, my name is Becky Jaimes. I am the product manager at Heroku Data.
Chris Castle: We're going to be talking about data as you might have guessed from what Becky focuses on, and specifically Dataclips. Becky, can you tell us a little bit about what dataclips are?
Becky Jaimes: Dataclips is UI to run queries against your Postgres database against your Heroku Postgres database. It offers a couple of extra features that we're really excited about in the new release such as charting. You can explore the schema and things like that, but it is meant to be a way that you can collaborate with others when you're exploring data out of your Heroku Postgres.
Becky Jaimes: The other thing that for me when I started using these type of tools was that before these, when I was using mySQL, I would have all these text files saved in my dashboard in my desktop. The real query is this one. Don't look here. Look this one. Then also what happens if your computer's gone? At one point, my computer was stolen, and I was like, "Oh my God. All my queries are gone." You have to start from scratch.
Chris Castle: Your text file of all your SQL queries.
Becky Jaimes: No, seriously. Because when we were talking about 2012, we're talking about a long, long, long time ago. You were able to get all your queries in one place, and with just a link share it with somebody else. Then it also saved a version of it. I wasn't afraid to redo or change something into the query thinking, "Oh I'm going to screw this query up. Like, I have to do it all over again," because you can just go back to the versions.
Chris Castle: Definitely. We were just talking about the origin story of Dataclips, how it first started. You mentioned 2012 or '13, so five or six years ago, the first version of Dataclips was released.
Becky Jaimes: Exactly. If you imagine back there is mode analytics didn't exist even then. Periscope was just coming into surface, and Luca was just launching. Data was living in the dark ages. We did have all this great tools to share code and collaborate with code. Even with text documents, Google Docs have been around forever, but for data, we didn't really have anything like that back in the day. We had this Heroku Postgres offering, and it was a great way for people to start tinkering with the data and trying to share it with colleagues and getting more productive with it.
Chris Castle: Maybe not a lot of things, but many things that Heroku built to scratch our own itch, and you were mentioning how we use them internally a lot.
Becky Jaimes: Yes. It's really interesting because Heroku is the largest Dataclips customer actually. All teams from different parts of the company use it for different use cases. The BizOps teams use it. Our engineers use it. Even our product managers will use it. Even the marketing team uses it. Everybody has a different use case. We check every single one. We have customers that only the engineering team uses Dataclips. We have a couple of other customers that only the product people uses Dataclips, but at Heroku, each single team, we check every single box.
Becky Jaimes: The only thing that I see that at Heroku we didn't use as much as a lot of our customers do is use Dataclips to create dashboards, and display it in TVs and things like that. Maybe it's because we don't have TVs anymore hanging around the office like we used to.
Chris Castle: That gets back to the inspiration for Dataclips and the problem we were trying to solve when it was first created. It was like it's never going to be a Tableau writer mode or any of these deeper data analytics tools, but we just wanted a quick way to jump in, run a query, and be able to collaborate or share that with others. You were mentioning some other services that this was following in the path of.
Becky Jaimes: It's funny you mentioned Tableau because Tableau has also always been around forever, but the things is that with Tableau it's a commitment. It's a thing that you have to install on your computer. You run all this. You have to connect it properly. Do all the things properly. This is a pretty expensive license too. It's just a commitment. It's a real commitment to use Tableau. Back in the day, a lot of people collaborated their code using GitHub Gists. That was one of the big inspirations for Dataclips. Back then, we did have these type of things for code, but we didn't have a way that you can draft a quick query, share it with someone, and move on.
Becky Jaimes: Not necessarily to build huge reports and dashboards and things like that, but the whole inspiration was just to build something that you can create that eventually because since its inception, you're able to explore the data. You're seeing, I don't know, all kinds of different ... I don't know at what point JSON export came into the picture. I think it was released with the original version, but ever since we released the export function, people also use it as some sort of a light ETL API point that people can hit and populate different tables or excel formats and things like that.
Chris Castle: That makes sense. I don't think I've used the XML format, but I've used the JSON and the CSV format. They're generic enough that they make it easy to import into Excel or Google Spreadsheets or something like that, but then you also mentioned, I think, what you just alluded to there is we have a customer that effectively uses dataclips as a JSON API endpoint.
Becky Jaimes: To drive a dashboard, yes. That's correct.
Chris Castle: For an internal use, not for their customers, but more for their internal analytics or internal use.
Becky Jaimes: They build all these queries. They run them and then they export the query with one of these methods. I think, they used Google Sheets and create some dashboards, and they display them around the office. Now that we're talking about this without going too far, before we released the Schema Explorer, not having a schema was one of the biggest constraints for people to get in as they're running queries quickly. You had to get a debrief from someone who's like, "This is how the whole thing is organized." Now that we're talking about this, I remember internally at Heroku, the BizOps team used to have a query that populated a spreadsheet, and the spreadsheet was our schema directory.
Chris Castle: I remember that for the giant Heroku data warehouse.
Becky Jaimes: Yes. Yes, for a data warehouse, and it was all populated from ... It was all export. It was a CSV export that was populating it. It was pretty cool. Now that we talk about it, I remember that.
Chris Castle: I remember when I first started at Heroku finding that to understand all the different tables and columns in that giant data warehouse. I also used it until there was schema exploration and this new, I don't know if it's called IntelliSense or if that's a trademark term that we can't use, but until then, I still always had to run PSQL, go into the database, and then-
Becky Jaimes: Check the schema.
Chris Castle: ... run like backslash-d or something like that to see the Schema for the table, and then come back to the web interface and technical query.
Becky Jaimes: What is so cool about this is that it's always fresh. The results were always updated, because every time you opened it, it will repopulate itself with whatever was the latest information. That was one of the other things that yes, back in the day you will run this queries. You would explore the data into some text file or into a Google Spreadsheet or a Microsoft Excel or whatever it is, but then the problem is that the data immediately became stale. The moment you did paste, I guess the moment you did copy, the data became stale. That was what it was-
Chris Castle: It was that connection to the source.
Becky Jaimes: I remember when I first opened a spreadsheet with our internal schema explorer. It was so cool. It took a little bit to load, and that's when I understood what was happening. I was like, "Oh, wait a minute. This is taking a little bit to load because it's actually refreshing the entire thing." This is something that we also took care of in the new release of Dataclips. Things are much faster and the data's always fresh, so that lag that I used to have when I used to open that thing shouldn't happen with the new one. People shouldn't expect that with the new one.
Chris Castle: That SQL query that you just mentioned that the BizOps, internal Heroku BizOps team created that effectively queries the schema of a big database, Dataclips is now effectively doing that internally.
Becky Jaimes: Yes.
Chris Castle: Then showing that in the web UI in a nice way and keeping it up to date for you.
Becky Jaimes: Exactly. It's really cool too because for example, you can just type the name of a row, and it'll show you all the different tables that have that. For me, it's been very useful to be able to connect and know what tables to join and whatnot.
Chris Castle: I don't know any developer that doesn't enjoy or appreciate having some completion, whether it's CLI completion or table name completion or column name completion for Dataclips. I never used a big IDE, or I didn't start my programming career using a big IDE that had the completion built in. When I started finding tools that that gave me this completion and IntelliSense, I was like-
Becky Jaimes: You're like what?
Chris Castle: "This makes me faster. It makes me more efficient." It doesn't require the same cognitive load to remember, "What was the name of that column or that variable? Did it have an underscore, or was it one word with camel case?" Yes, I'm pretty excited about that.
Becky Jaimes: And also something really interesting to note is that we released Dataclips a long time ago in 2012. The truth is that it really hasn't changed that much because the use case is still the same. We are not trying to replace. We're not Moat Analytics or Periscope. The reason that the product has not changed is because the use case is the same. People come in here. Do some quick exploration. They want to come back to the same query that is something that is just interesting for them and maybe not for the entire organization. They come, they run it, and then somebody else comes in, and it's like, "Oh, somebody ran that query. That's interesting. I wanted that query too."
Becky Jaimes: They're able to find that, and I've been able to run so many queries so much faster through Dataclips because I'm like, "Oh okay, let's try to find, I don't know." When I first started, it's like, "Let's try to find who are our Postgres customers that do A, B or C." In the search, you're able just to put Postgres and whatever thing that you were looking for. Sure enough, someone in the past five years have run a very similar query. The only thing that I always had to be a little bit cautious was that schema's changed. It's like, "Oh, they used to hit an old table," but you would know right away that you were querying something you were not supposed to be querying because maybe the table doesn't exist anymore.
Becky Jaimes: It'll tell you, "Hey, that column is not there anymore because the table doesn't exist." Then you're like, "Oh," but you have a good understanding of pretty much how everything gets to go, and you're able to query really fast.
Chris Castle: You talked about data exploration. Is that the most common use case that you've found as you've been looking at behaviors and how customers use Dataclips?
Becky Jaimes: We're about to release the new version. Maybe by the time this airs, we will already have the new version of Dataclips live for everybody, but we ran a very long beta program with some customers. We selected customers with different use cases, some that had thousands and thousands of queries and others that had, I don't know, 40 queries. We were trying to find people from different buckets, right? For these customers, I was surprised this time that this customer had about 2,500 queries.
Becky Jaimes: I say, "Okay, we're going to migrate all your queries. Everything should be ready by tomorrow." He's like, "No, no, no, no, no. Don't migrate any of my queries." I was like, "Why? What's going on? You don't want to carry on your old work?" He said, "No. All the queries that we run and they you see in there are basically exploration. You know, like, we run a quick query. We find the result. We're like, "Oh, okay, cool. We're in the right track." And then they take that idea, and they, I don't know, continue their work somewhere else. They're able to go back to Periscope or Moat Analytics, and actually do a real analysis, but they use this as the entry point.
Becky Jaimes: When the person's sitting there before they go into a meeting and say, "Okay, let's propose this project or whatever," they're able to run this query. That's one of the use cases. A lot of people use it for data exploration. Then we also have customers in the other extreme. It was customers that had about, I don't know, let's say 40 queries. These are customers that have been using Dataclips for years. We've had this client that was using it for the past, I dunno, five years, almost since the inception of Dataclips, and they had the same queries.
Becky Jaimes: I was like, "Oh, that's so interesting because if you look at their logs, they hit Dataclips every single day, but they only have the same 40 queries." These are people that found these really cool queries of this metrics that they want to track. Then they export that into dashboards, and they display that in TV somewhere. This is the thing that they log in every day and then they see. That's the reason that they don't need that many more queries because it's the same 40 queries, and they are just trying to track usage or behavior or something like that.
Chris Castle: It's like they've decided these are the important queries for the business, or the important data that they want everybody in the company to be looking at every day.
Becky Jaimes: Then there's another interesting use case also. Internally, this is a team internally at Salesforce is the mergers and acquisitions team. The mergers and acquisitions team have this series of queries. They want to use Dataclips as a mechanism to export data out of Salesforce into a way that their analysts can actually do something useful with it.
Chris Castle: It's like the seed or the first step to an ETL process.
Becky Jaimes: Exactly. Exactly. Through Heroku Connect, they put the data into the Postgres instance. Then any analyst can run SQL into this data that is actually changing as they speak. They can run this queries as the things are happening.
Chris Castle: That's cool. I guess you could write code. We could write code that stitches these things together, or we could use a purpose built ETL tool to connect different data sources, but this lets them do it in a lighter weight or easier way that is still still reliable. It's still secure. It's still safe.
Becky Jaimes: Yes. That's the key, easy, and secure and safe. You don't have to put in this huge infrastructure behind that because you could just ... Connect is literally point and click. You go there. I've seen your demo. You go there. You point and click, and then just like that, your data magically appears in a Heroku Postgres database. Then you just run the SQL query on it. It seriously doesn't get any easier than that. Once you remove that layer of complexity of like, "Okay, I have all this data here," because you can get really hacky. I'm pretty sure that we have a ton of Salesforce customers doing hacky things to get their data out of there so that they can run a SQL query, but it doesn't really get any easier than using Heroku Postgres.
Becky Jaimes: Heroku Postgres allows you to extract data from the Salesforce organization, your Salesforce data. Through Heroku Connect, you can put your data into Heroku Postgres database that you can query. Internally, Salesforce teams, the merger and acquisitions team, they have all this data in Salesforce, all this data from all these companies they're acquiring. Then using Heroku Connect, they take the data into Heroku Postgres, and then they run the same set of queries for all the mergers and acquisitions that they do. What they do is they just work that query.
Becky Jaimes: They create a new version of that query, and then they just change it, and they're able to track in the exact same way really fast queries for whatever company that they're interested in acquiring or that they already acquired and things like that. It is a different use case, but I also think it was worth bringing the cool way to basically run SQL in your Salesforce data.
Chris Castle: It makes it very easy to use Heroku Connect to get the data in Postgres. Then that makes it accessible, and anyone who knows SQL or some variant of SQL can then query it, and then Dataclips ...
Becky Jaimes: And share.
Chris Castle: Exactly. Dataclips lets you very easily query that and share it.
Becky Jaimes: With the new release of Dataclips, when you share it, you have the option of share ... Before when you would share, you could see everything. You could see the query, and then you can see their results. Now in the new version of Dataclips, you can choose to only share the results. It will just show you a nice page with the table of the results, and you can either see a chart. It's a really simple chart, which is also a really romantic and cool story. When we were working on the release of Dataclips, the motivation behind the new release of Dataclips was to update the technology behind it to make it faster, because as I said, the use case really hasn't changed, so we didn't really need to add a bunch of stuff because there's always Tableau or Looker or Periscope or Moat.
Becky Jaimes: The use case is still the same, but something that was really cool is like, "Okay, we're upgrading all this technology." That summer, we had an intern. She came, and actually, it was the intern who released the charting function of Dataclips. The reason I say so romantic is because where in the world can you as an intern go to a company, work for a summer, work in the feature that you're really, really excited about, and then see it on the real product when it goes GA right away?
Chris Castle: That's super cool. When I had been an intern, when I was starting my career, it feels like you're always working on like, "What is the shape of that button, or some function behind the scenes for that button?" Then you never know if it's going to go live or not.
Becky Jaimes: How many coffees do I need to bring this morning?
Chris Castle: Totally.
Becky Jaimes: She was awesome. The team was very sad to see her go. I'm pretty sure that she's going to go out to do great things, because she just came and designed the charting feature, worked on it. Of course, she paired with our engineers, but it was her doing. I was out on maternity leave actually when this happened, and because maternity leave was really long. It was six months, and the internship is three months. I never actually got to work with her, but I come back from maternity leave, and they're like, "Yeah. So the intern build a charting function."
Becky Jaimes: I'm like, "What?"
Chris Castle: That's so cool. For this new Dataclips release for new dataclips, what were some of the pain points that you wanted to solve with the new Dataclips?
Becky Jaimes: Again, the main motivation when we rewrote Dataclips was to update the technology so that we would be able to push security updates and things like that fast, that it wasn't a huge burden on the team because the code was so old and obsolete. We want to update all the infrastructure behind it. Then with that, I mean, we have a laundry list of things that customers want, because the more you give ... It's like data. The more you know about the data, the more questions you're going to have.
Becky Jaimes: The same applies to this product. The more features you release, the more features people want, but we had this huge laundry list of features, but the things that people were more excited about was sharing. We did a massive update to sharing. I mentioned already that you can share just the results, and you can also share the whole query and the whole thing with somebody else.
Chris Castle: Whereas before, it was always the query plus the data below it.
Becky Jaimes: Not only that, the person that you were sharing with had to have a Heroku account.
Chris Castle: Oh, okay.
Becky Jaimes: Before, somebody had to had a Heroku account, because when you click on the link, they had to be able to login into the actual dashboard and platform and see it. Now with the new one, when you create a public URL of your clip, you just send that to somebody else, and they would be able just to see the results in a nice page, in a nice website. Okay. That's one of the updates. We also talk already about Schema Explorer and autocompletion. Then the other thing that we're really, really excited about it is that results are always fresh. Before, you had to wait, and because ...
Becky Jaimes: For example, if you're exporting things to Google because of the way Google Sheets cache things, sometimes to update the results would take a couple hours, but now, that shouldn't be the case anymore, because now, basically, the results are run much, much more frequent. You shouldn't have that type of problems anymore. Of course, we released a ton of other things, like for example, the charting feature, a couple of other things that you'll see as you start logging in, but those three were the ones that we were most excited about.
Chris Castle: That's cool. I didn't know about the faster updating. I'm excited about that, because part of my role as a developer advocate is building and giving demos. I'm often giving a demo that maybe includes Dataclips, and doing it over and over again. Sometimes, I switch to Dataclips, and it's a realtime demo. I want the data to be updated just to show the people there. Sometimes, it would update and, and sometimes I would hit refresh a few times and then talk through my demo and kind of be like, "Oh, we'll come back to that."
Chris Castle: I'm personally excited about that.
Becky Jaimes: This shouldn't happen anymore, and if it happens, let me know because this shouldn't be happening ever again.
Chris Castle: Great.
Becky Jaimes: Things are fresh and quick, so we're really, really, really excited. The way I like to talk about when I explained why we redid Dataclips is well actually, this was one of the engineers Maciej gave me this example. It's like, "Becky. It was like, in the United States, you have a roof with shingles. There's so many layers of shingles you can put into a roof to prevent leaks. When the first layer becomes old, instead of putting a new roof, you put another layer of shingles. Then once you get a leak, you put another layer of shingles."
Becky Jaimes: "There's so many layers of shingles you can put into the thing before it collapses." It was similar what was happening with Dataclips. It was this very, very old infrastructure that we keep patching it to make sure everything was good enough, but it gets to the point that to do a security update or anything, it will take us forever.
Chris Castle: I can understand. Great. It's more work for them that they just more repetitive and not really fun work I'm sure for them to make sure that, I don't know, any packages with with recent vulnerabilities are patched or updated.
Becky Jaimes: It was painful.
Chris Castle: I guess I just have one more question to finish up. What is your favorite new feature of new Dataclips?
Becky Jaimes: Oh man, this is the hardest question ever, because it's like, "What is the favorite feature of your child?"
Chris Castle: Which child do you love most?
Becky Jaimes: Yes. Yes, exactly. For me, the whole thing, I love that it's so simple. It addresses one problem, and it does it very well, which is just a place to run a simple exploration query. That's it. Then you can share it. You can get the public link or get the private link. Share it with another thing. Oh, I do have something we haven't talked about that I actually love personally. We've talked a lot about Dataclips as a collaboration tool.
Becky Jaimes: I remember when I was new in Heroku, I was a little bit shy to run my queries, because I didn't want anybody to see basically the errors I was making as I was figuring things out. Because I'm a really proud person, I did run some queries first in the command line. Like in the CLI, I will log into my follower and run my queries there just to make sure I wasn't going to break anything in the actual Dataclips, and I didn't want anybody to see my mess. Now with the new Dataclips, you can work on a query privately.
Becky Jaimes: Then once it's ready, you can share it. You wouldn't have to have that step anymore. You can run on this query. Run the results. It's like, "Oh, that was ugly. That was definitely not what I wanted to do. I keep fixing it until it's perfect." Then once it's ready, you can share it and make it available for other people to see.. That feature, I actually really, really, really love.
Chris Castle: I like that too. I didn't know that that was a new feature or unintentional feature. Also speaks to that, that use case that you mentioned earlier of the one customer that had thousands of Dataclips queries, because dataclips just by default would ask for a name, and then save the query for you. But now it sounds like you don't need to do that or you can just keep it private. When it's pulling the correct data in a nice, formatted nice way, then you can share it with the rest of your team, and you can look awesome.
Becky Jaimes: Exactly, or show how awesome you are.
Chris Castle: Yes. Totally. Great. Well, thanks for joining me, Becky. I'm excited to play more and more with Dataclips. I'm going to actually go and update one of my demos that uses dataclips to show off all the new features. We'll be showing that at the upcoming Trailhead, Salesforce's TrailheaDX event in May.
Becky Jaimes: I'm very excited.
Chris Castle: Excited to get to use the new stuff. Thanks very much everyone for joining us for another episode of the Code[ish] podcast. Checkout heroku.com/podcasts for more.
Becky Jaimes: Bye guys.
A podcast brought to you by the developer advocate team at Heroku, exploring code, technology, tools, tips, and the life of the developer.
Developer Advocate, Heroku
Chris thrives on simplicity and helping others. He writes code, prototypes hardware, and smiles at strangers, helping developers build more and better
More episodes from Code[ish]
Anupam Dagar and Joe Kutner
The GitHub Student Developer Pack is a collection of free offers and discounts from dozens of tech companies including Heroku, SendGrid, Sentry, and TravisCI. Anupam Dagar is a final-year undergraduate student at Indian Institute of... →
Tanmai Gopal and Owen Ou
GraphQL is a querying language with the aim of increasing the productivity of frontend and backend developers. It can make working with React easier, be used as an API for third-party clients, and allow for feature-rich applications to... →
Parker Phinney and Julián Duque
Coding problems can be the hardest part an interview. Whether you're standing in front of a blank whiteboard or typing on your laptop as someone watches over video chat, the process can be nerve wracking for even the most skilled coder.... →