65. Scaling Tech for Teachers
Hosted by Sandy Lai, with guests Ben Small and Mitchell Peabody.
Panorama Education is a platform that helps educators use data to support student’s needs, and helps communities build great schools. Rather than a one-size-fits-all SaaS model, Panorama tailors its offerings to match the level of digital architecture which a school district has in-place. Ben Small, a software engineer, and Mitch Peabody, an engineering manager, join Sandy Lai in conversation to discuss how they achieve their goals of improving student success.
Sandy Lai is a customer solutions architect at Heroku, and she is interviewing two employees at Panorama Education: Ben Small, a software engineer, and Mitch Peabody, an engineering manager. Panorama is a platform designed to help educators, teachers, and principals, understand their students and their community by offering feedback surveys. As different schools have different technical expertise in their districts, Panorama caters its product to meet those needs. Different schools have different systems in place--sometimes as many as two or three different platforms in the same district--and Panorama built technology to pull that disparate data into one location to provide holistic (and individual) views into the results.
The team at Panorama has found that schools largely rely on paper as they're extremely cautious about the data that is being shared on the kids in their classrooms. Security and privacy are top priorities for everyone at Panorama, from the CEO on down. They established a security working group to meet once a week and commit to making sure that security best practices are being taught and followed across the company. They also offer annual training for all employees on how to protect customer data.
Panorama's currently tackling challenges around scaling, both from its customers and as the business itself hires more employees. There is a certain seasonality to their work, where the Spring and Fall months tend to be the most active months on the platform. There's also the issue of timing, where hundreds of thousands of users log in at around the same time. To help offset this, Panorama rolls feature changes out across timezones. That is, when the schools on America's East Coast are finished around 3pm, new updates are deployed for that region, a process that continues westward. This way, they're able to ensure reliability without disruption to the future time zones for their end-of-day rush.
Links from this episode
Sandy: Welcome to Code[ish]. I'm Sandy Lai, a customer solutions architect at Heroku. My team helps customers plan, scale, and run their app successfully on Heroku. Today we are going to talk about security, scaling, and some unique challenges to bringing better tech to education. Joining us are Ben Small and Mitch Peabody from Panorama Education, a powerful platform that supports student success. Could you tell us a bit about yourselves?
Ben: Absolutely. My name is Ben Small. I am a software engineer here at Panorama Education. I've been with Panorama for about two and a half years at this point and I work on our support and security teams.
Mitch: Hi, I'm Mitch Peabody. I'm an engineering manager at Panorama Education as well. I've been here three and a half years now. I help run our infrastructure squad and I'm also on the security team.
Sandy: How about a little bit more about Panorama Education?
Sandy: That's amazing. I have to say, that the software looks amazing and I wish something like this had existed back when I used to be a school teacher, so very excited to be talking with you today. As we're talking about adopting new technology, that can be pretty tough in any industry. How do you prepare schools to adopt your software successfully?
Ben: Many schools still do lots of things manually. They keep track of data on paper. There is not enough digital penetration into the school system. There may be a computer lab, but an individual student might see that computer lab once a week maybe. We've had to do a lot to figure out how to blend sort of old school paper needs with the capabilities that a cloud platform can present for schools.
Ben: We do that pretty well, and we make sure to tailor our offerings to these schools based on the level of digital architecture that they have in place. For the service example, if a school has a computer for each student, it's very easy for us to say, "Oh yeah, each student should take a survey on their laptop," and for a school that maybe has one computer cart that they share amongst the entire school, we can say, "We understand that this information is still important to you, so let us work with you to generate paper artifacts, paper surveys that students can still give the valuable feedback in a way that supports the school's needs."
Sandy: It sounds like a very interesting problem. And you must work with schools throughout the country located in various places in that. Could you tell us a bit more about how you're integrating your software with different school systems?
Mitch: Yeah, mainly for our student success offering, we encounter different types of SIS, school information systems, or student information systems, that vary largely by region. Schools that do have a certain amount of technology that they use to track the progress of their students, have adopted many varying platforms. Sometimes even within the same school district or same school, they'll use two or three different platforms that we have built technology to pull the data into our system, into our platform in order to provide the holistic and individual views with the students to the schools.
Sandy: Sounds like there's a lot of customization that you need to do depending on who you're working with.
Mitch: Yeah. Our team spends quite a bit of time figuring out, if we encounter a new platform, we have to do some exploration and figure out what data is contained within the platform, how to interpret it, also, I think more basically, how to actually extract the data. What we try to work towards is a faster integration time.
Sandy: Okay. So I guess you come up with playbooks and that's so that you can reuse them for the next time you encounter the same sort of systems.
Mitch: Yeah, I would say playbooks to start, but then we've really tried to start automating and putting a lot of the configuration power for these different data systems into the hands of nontechnical users at our company.
Sandy: That's great. You did mention about looking into how to work with their data, and I imagine that schools would be pretty concerned about things like data privacy and security, not to mention that makes a lot of sense. How do you address these with and for your customers?
Ben: Schools are rightfully extremely cautious about what data they're sharing and with whom they're sharing that data. We're talking about kids. Especially over the last few years, schools have become a lot more wise to the dangers that are out there with the rise of Ransomware, and a lot more savvy about knowing and asking the right questions about, who is going to have access to this data? What are they that to do with it? Who are they allowed to share it with?
Ben: One thing that I think is really wonderful and encouraging, are that schools are talking to each other about this a lot. There are some consortiums that have been formed out there between essentially the IT leaders of schools, to talk to each other about, "Who do you work with? Who do you not work with? What sorts of restrictions do you put on those companies that you decide to trust?"
Ben: It's extremely important. It's extremely important, because we talk about data privacy and data security as a society a lot, but often it's in the context of adults or what we do on our smart phones. This is like, "Oh a student took a quiz and he failed that quiz and he feels really bad about failing that quiz." The consequences of all of that school data in aggregate being compromised are extremely drastic.
Ben: I consider ourselves pretty fortunate. Security and privacy are top of mind for our leadership team, for our CEO, across the company to address these questions, these concerns, and these issues. We've created a working group of security minded professionals from across Panorama, so it's not just a group of engineers figuring out how to secure the data. It's a group of engineers sitting with our outreach team and sitting with our client success team talking about, from the very start of this process to the very end of this process. How do we effectively protect and defend the data that we collect?
Sandy: Tell us more about this security working group. Where did it come from? Where did it come about?
Mitch: When I started at Panorama, it was also coincided with the start of our Student Success Platform, which is where we started pulling in a lot of sensitive student data. We recognized early on that we needed to step up our security game. So myself, and a couple of other engineers, our VP of engineering, and our director of operations started the security working group. We started meeting weekly. We initially started by building a bucket list of all the things where we thought we could improve and the places that we needed to pay attention. From that, we figured out where were the changes and processes and things that we could implement that would have the highest leverage.
Mitch: I went into it as an engineer thinking, "Oh, we're going to do cool penetration testing and we're going to do algorithms and stuff." And it wound up being the case that where we found the highest leverage activities that we could undertake would be developing very clear and concrete security policies for internal use at the company. So that our employees understood good security practices and what they should and should not be doing. We also implemented an annual training program and onboarding program.
Mitch: Every Panorama employee, when they first start at the company, goes through a security onboarding within the first three to four days. It's a training of, here are the things that you need to know about how to be secure with your work and how to treat and approach privacy concerns with our clients, with schools. We also repeat this training annually for all employees. In fact, we just had our annual retraining earlier this week.
Sandy: Wow. Yeah. It sounds like you've taken a really holistic approach at attacking it from a bunch of different levels to make sure you're covered on all your basis. That's fantastic. Can you tell us about some other unique challenges that your company might face, just trying to bring this technology into the education?
Ben: Absolutely. One aspect of offering surveys to students so that the students can make sure their voice gets heard, is that we need to make sure that all students can make their voice be heard. This means if a student isn't fluent in English, we have a responsibility to make sure that student has the ability to make their voice heard regardless. Our language support is quite impressive. We support upwards of a hundred languages, I believe.
Ben: Across the platform. It's something that our clients request of us. Because a principal knows that if they're collecting feedback on how their school is doing and they're missing an entire population of students who can't respond, then they're missing an extremely important voice and an extremely important aspect of diversity in their responses. Beyond that, our platform needs to be accessible in lots of different ways as well. So, fully compliant and functional with screen readers and other accessibility tools. Making sure that when we purport to say that we help to elevate student voice, we need to make sure that we can support every single student's voice.
Sandy: What about things like scaling? Is that a challenge for your company?
Mitch: Yes, I would say absolutely, from a few different dimensions. So because we work with schools, we have seasonality effects. Particularly with surveys, where these are events where a lot of students are taking a survey at the same time. And they tend to coincide with the start of the school year and also towards the end of the school year. So our team gets pretty busy around the August through, let's say October, November timeframe. And then there's a little bit of a lull. Then it starts ramping up again about this time actually. And it'll continue ramping up until we hit about May, June and then they'll taper off. That's like one aspect of our scaling.
Mitch: Another aspect that I think we've started feeling more acutely, is that for our system that pulls in data from the SIS systems, the integration pipeline, that has some unique scaling concerns in that it processes a lot of data every single day. We use a methodology called Event Sourcing, which has placed a certain amount of computational strain on our system that we didn't have with our surveys platform.
Mitch: We found as we've been ramping up this platform over the past three years, that it's starting to be the majority driver of our cost increases. But the other dimension that we're encountering, is that scaling our engineering team specifically in Panorama as a whole. When I started, we were at around the 60 employee mark with 12-ish engineers, and we are approaching 40 engineers and over 160 people I believe at Panorama.
Sandy: Quite the growth.
Mitch: Yeah, so it's rapid growth. It's not like adding 1000 employees, but it's definitely introducing changes to the way that we have to organize, and communicate, and particularly with engineering. Where now it's actually very possible and I've seen this with other engineers of, how do you transfer and ensure that knowledge that has been locked up in heads just due to moving quickly and maybe not taking the care to document things? How do you make sure that that gets transferred to all the engineers? How do you have your hiring processes scale? How do you have your training processes scale? All sorts of fun challenges.
Sandy: Yeah, I mean you bring up a great point. When people think of scaling, you usually think of just the technical side of things, but there's definitely a people aspect too, when you're thinking about scaling and all that that entails. I do feel like a lot of our listeners are probably very interested in maybe the technical side of that scaling. Would you mind telling us a bit more about what you've been doing on that level, to address the various issues that you've encountered?
Ben: One specific thing that we do is proactive load testing to prepare for anticipated usage spikes. I think Mitch mentioned the seasonality of our work. We can actually get more granular than that. One thing that we're working on this week, is a school district has said, "We have 100,000 students. And we want all those students to go on to the website at exactly the same time and take the survey."
Ben: To the engineering team. That sounds like, "Okay, we're going to have 100,000 hits at exactly the same time. Let's make sure our servers don't fall over." We have systems in place to simulate that load so we can get a script that essentially mimics a single workflow and duplicate that across dozens of different machines in a big pool, and make it all crash into our server at the same time to say, "What sort of performance metrics are we seeing? Do we need to scale out our server pool? Or can the infrastructure that we normally run day to day handle it properly?"
Ben: The other time that we do proactive load testing is often when we're releasing new features. If there's some new feature that we expect to be more computationally difficult, we can do a similar analysis. Also, some by using backend instrumentation, we can look at the performance profiles of how this feature is expected to run, times the number of people that are expected to use it, times some fudging factor to make sure we're actually safe. Then we respond appropriately by saying, "Oh, we need to up the power of our database, or we need to increase the power of our servers, the number of servers that we're running".
Sandy: I think you bring up a really good point there with proactive low testing when you're releasing a new feature. I feel like a lot of people like, sure it might come to their head that if they know that there's going to be high usage, they know they're going to go and load tests. But I don't think as many people think of doing that one when we're talking about just releasing a new feature. That's a great tip for people to take into account.
Ben: To be perfectly clear. We don't do it for every new feature that we release. We probably could, but it gets a little expensive at that point with the server time and all the simulations. But we have a reasonably good idea of what features add negligible complexity and what features are really quite heavy. We try to do a good job of when we are dealing with those heavy features, to be a little bit more intentional about making sure that we are providing a stable service to our clients.
Sandy: How about your data? You've mentioned data a few times. I imagine you must be working with tons of data. How are you addressing scaling, in terms of that?
Mitch: Yeah, when it comes to our databases, we can look at it from two sides, the write side and read side, which I don't think it comes as a surprise to anyone. From the read side, like as Ben was alluding to when we release a new feature, we'll generally have a good sense for whether or not it's going to be slow and if we should be more proactive about our load testing. Usually, a good solid indicator is how much it was going to hit the database with queries. When we have a slowness that shows up, and one of the things that we look for is what queries are being performed when that page is being loaded, and running some analysis on the queries and making sure that we have indexes in the right place, and that we've we use Postgres so we make sure that our autovacuum settings are tuned appropriately.
Mitch: Then on the write side, one of the things that we've been doing lately is, I mentioned that our ingest pipeline is one of the drivers of costs. And that's because one of the things it does, is it goes through a ton of data files and it writes a bunch of rows to our database every single day. Until recently, that was largely done row by row. So data file, we compute what's known as a diff or a delta of what's changed over the previous day, and use that to insert or update rows in the database.
Mitch: One of my squads have been testing a batched update system. So rather than have a fleet of machines all hitting the database at the same time, so they each have a database connection that they're using to put data into the same database, they are batching up the writes on local key-value stores. And then after all that processing is done, doing a mass batch update from those individual servers into the database. Which results in like ultimately overall faster write times and less contention for database connections. And so that's actually work that's in progress right now, and we're seeing pretty significant improvements in our refresh times; going from several hours down to like an hour.
Ben: The other piece that we are able to take into account because of the specifics of our industry, is that schools and teachers, they do some work at the end of the day, they close things down, they go to bed and then when the school day starts up the next day, they expect everything to look refreshed and correct. But because we have so many different clients across so many different time zones, we can actually use that to our advantage and kickoff the processing needs of our Eastern time zone clients first, and work our way West across the country so that instead of scheduling all the jobs at the same time, we spread it out so that each group has the best opportunity to be ready by the time that the data is needed. So Boston and New York are going to get their stuff done first, and Hawaii will get it done a few hours later. But to the folks in those time zones, it's the same experience.
Sandy: Well, what do you do in terms of, what's your monitoring strategy? You mentioned looking at metrics of course when you're doing a load testing and other things like that, but just in general, what does your monitoring strategy entail?
Mitch: I would say that we have a pretty heterogeneous approach right now. So we use Heroku, which has quite a bit of monitoring already built in. And we use a number of third-party add-ons to allow us to look at a log traces, to look at basic machine level metrics, or load number of connections for our web apps, what are the status codes that we're seeing. We also use a PagerDuty system for, if we see errors that are critical that are impacting the ability of our users to actually access the platform, it'll alert an engineer and say, there's something to look at here, this is concerning, and we'll be in a reactive mode.
Mitch: Where we're at right now, is we're trying to become a little bit more proactive. So rather than waiting for problems to crop up, we create a view of our system that is going to allow us to say, "Oh, I see X, Y and Z metrics are behaving this way at this moment. This is a signal that I know portends maybe something more serious occurring down the line, so I'm going to get in front of it." The development of those, that monitoring capability is something that's still kind of ongoing, and I think Ben's squad has also put some thought into it as well.
Ben: Yeah, on the engineering team, we have the most direct communication with clients. And so being able to go to a client before they notice a problem, or be able to fix a problem before we have to go to a client is even better. But being able to go to a client and say, "We know that these users experienced issues, we've already identified the problem and we've already fixed it, have a wonderful day," is a much better experience from their perspective than them having to call us and say, "Hey, I'm trying to use your website and nothing's working. What's going on?" Getting ahead of that curve and being more proactive and being able to detect problems before they start or as they're starting as opposed to when downstream users notice the ramifications of them, is something that we're definitely investing in and trying to improve right now.
Sandy: Well, you've talked a lot about the scaling, security and everything else. Do either of you have any more advice you can give to people that need similar levels of scalability or security needs?
Ben: One thing that I've found that we do that's really helpful that I would recommend for anyone regardless of the level of maturity of your security programs and your security needs, is to first of all, have a written down plan for what happens when you need to figure out whether something went wrong, and practice that plan. So we run security and privacy drills all the time at Panorama. And we do it not because we're expecting something to go wrong, but because if and when something does go wrong, we want to be able to address it with a level head and not be freaking out and not knowing what to do. Sort of having that experience under your belt really helps to make sure you respond to situations effectively and quickly and appropriately when they do arise.
Mitch: And I would add to that, practice them also for outages, not just security drills. But as you practice them, make sure that you take the time at the end of the practice to figure out what should have happened, what happened, and what you could do better the next time. So rather than just say, go through the motions, you actually use it as a learning opportunity and use it to improve your processes.
Ben: The one thing I'll add to that, is when you're thinking about what you'll do better next time, don't think of it in terms of what you'll do better next time, think of it in terms of how you can improve your documentation and how you can improve your processes, so that it's easier for the person who does this the next time to fall into doing the right thing. Going into those meetings and saying, "Oh yeah, well I did this wrong, but don't worry, I'll do better next time." It's a great sentiment, but it's not addressing the core of the issue, which is that-
Mitch: The system was not correct. So stepping up a level, one of the things that I think we do really well as an engineering team at Panorama, is we don't place the blame or seek to place the blame on individual engineers. Our ethos is that, if someone was able to do something that impacted our systems to the extent that it brings them down or otherwise causes difficulty for our users, then it's a problem in our engineering processes and not any particular engineer. And so, bring this back down to the outage protocols and the security protocols that Ben's talking about, we're not seeking to have people say, "Oh I did this," or "I did that." It's more like, what can we do better so that when stuff really is going wrong and people are already at high levels of stress, we have something that anyone can refer to and very quickly figure out what they should be doing. And that's an ongoing process and we're always improving it.
Sandy: Yeah, I think that's super important to create that blameless culture, as you said. And of course documentation is, I love documentation. So hearing this makes me all very, very happy.
Mitch: It's something that as we started scaling our engineering team, and I talked about this a little bit earlier where, being able to transfer that knowledge without necessarily having the same people in the room, is extremely important. Because it's just not possible as you have 40 engineers for everyone to talk to everyone else at the same time whenever they need a bit of information, or to expect that one person is the storehouse of all knowledge about a particular aspect of the system.
Sandy: And of course it makes sense for an education tech company to be concerned about their employee education as well.
Ben: Absolutely. We preach growth mindset for students and for ourselves.
Sandy: Oh, awesome. Well thank you so much for sharing with us some of your challenges and solutions. Listeners, you can learn more about Panorama Education at panoramaed.com. Thank you Ben andMitch for joining us today. It has been wonderful.
Ben: Thank you so much.
Mitch: Thank you.
A podcast brought to you by the developer advocate team at Heroku, exploring code, technology, tools, tips, and the life of the developer.
Customer Solutions Architect, Heroku
Sandy is a Customer Solutions Architect at Heroku, helping Enterprise customers successfully run their apps on the platform.
Software Engineer, Panorama Education
Ben has worked in both software engineering and education, and currently focuses on client support, security, and privacy for Panorama Education.
Engineering Manager, Panorama Education
Mitch has been programming for 25 years and has a PhD in Computer Science from MIT. He helps run the Infrastructure Squad and Security Working Group.
More episodes from Code[ish]
Vikram Sreedhar, Brian Chan, and Anna Chan
May is Asian and Pacific Islander American Heritage Month. On this episode, members of Salesforce's Asian employee resource group--Brian Chan, Anna Chan, and Vikram Sreedhar--talk about their experiences as part of the Asian community... →
Dejim Juang and Becky Jaimes
Data drives every software application, from individual projects to massive enterprise workflows. Whether that information is kept in your database, or someone else's, chances are you'll likely need to unite disparate sources to provide a... →
Luke Mysse, Troy Hickerson, and Charlie Gleason
During a time of uncertainty, it can be helpful to remember that opportunities to provide help are all around us. One such group with a unique approach to philanthropy is Active for Good. Active for Good is an app which tracks the number of... →