Testing in Production: A Conversation with Talia Nassi

[interview]
Summary:

Talia Nassi, developer advocate at Split Software, chats with TechWell community manager Owen Gotimer about the fears, myths, and benefits of testing in production and how to get your stakeholders on board. Continue the conversation with Talia (@Talia Nassi) and Owen (@owen) on the TechWell Hub (hub.techwell.com)!

Owen Gotimer

One thing that you're passionate about is testing in production, and I know you're a big advocate in that space. I know that there's a lot of people that are still even today, when they hear testing in production, they start to freak out a little bit, and they're nervous, and they're scared of what that means. Obviously, you being an advocate for it, what are some things that you talk to people about when they're a little bit nervous or scared about testing in production?

Talia Nassi

Yeah, so some of the things I talked about it was first, I addressed the risks like there there, there are risks to this if you don't do it correctly. So you know, it can affect your real end users. And it can affect your reporting and business decisions, and it can affect your third parties. But if you do it with feature flags, and you do it the right way, it really is super safe. And you can start out like this is what I tell people who are scared that you can start out with smaller features, things that are like color changes or like the placement of a button or just like super Small changes. And then slowly once you gain trust in the feature flagging system, you can increase that complexity of whatever features. So I would say just start small and then build it up. And then also just trust the feature lightning system. That's what feature flags are there for. They allow you to do so many things. So that's right. So

Owen Gotimer

For someone who is less in tune with testing and production and maybe feature flags and kind of the functionality that they offer, do you want to give a basic explanation as to what a feature flag is and how it can be used to help test in production?

Talia Nassi

Yeah, absolutely. So feature flags allow you to basically hide enable or disable a feature at runtime. So what that means is, you can separate your code deployment from your feature release so that if you push up code, if I'm a developer and I push up code to a feature, and let's say all Only the developers and like your internal teammates are targeted inside of that feature flag, which means only the people who are targeted in that feature flag can see that feature. And then no one else in the outside world, like your real end, users won't see anything related to that feature. So in the UI, and in the feature five UI, you'll be able to say, I only want, you know, these developers and this product person to be able to see this new feature. And I only want this test engineer to see this feature. And then you basically put their IDs or their email addresses or whatever inside of the feature flagging system. And then inside of your code, you say, if this user is inside of the feature flag, give them this treatment of, of whatever this new feature is, and if they're not, then give them the default feature and because the outside world are Not targeted in your feature flag and the default rule would be off at that point, then the outside world won't see anything related to the feature flag, they won't see anything related to this new feature. So it makes it really great for testing and production because you can release new features and target only your team. And then once you've tested it in production, while only your team is targeted, you can open bugs and work with your designer to make sure that everything looks good. And if there's any problems, like only you guys will see it like the outside world won't see it because they're not targeted. And then once you've tested and you're confident, then you can turn the feature flag on already knowing that your features working. So it's really a win win situation. You get to test in production and raise the confidence in your systems. And then once you release the feature to production, your users will have a bug free feature. So it's really great.

Owen Gotimer

Yeah, and I think I mean you talking through it, maybe it sounds like it's something everyone should be doing. But obviously, there's still a lot of people who aren't doing that. What things do you think are getting in the way of people taking advantage of a system and have a testing model that seems like it's going to deliver the best possible outcome to its end users?

Talia Nassi

I think a lot of it has to do with just the way that things have been done for so long in the past, like the norm is to have a staging environment and deploy your code to staging and then test it in staging, and then deploy to production. So it's just what companies have been doing for so long and testing production is such an innovative thing that I think people are. They are scared, and they they it's just, it's something new. And I think that that does scare people. So yeah.

Owen Gotimer

Yeah, absolutely. I mean, I think that that fear is still there. And I like that you brought up, it's the way we've always done things, which, obviously is a phrase that can be very trying and very troublesome for companies who are looking to move forward. They're always looking in their rearview mirror. But you mentioned there having a production environment, testing in production, and then deploying to production. When you do testing in production, do you do any testing before you get to the production stage?

Talia Nassi

Yeah, so yes, so um, it depends on like, what the feature is like, sometimes when you're testing things that are like, dependent on specific data or dependent on like privacy related things that you just cannot test in production. You can use a staging environment for things like that. So things that like are GDPR related or things that can't be in production, those things I would say, you know, Just testing staging or tests in a different place. Also, when you're when you're testing things like, um, let me think, oh, when you're doing things like spinning up a containerized environment per build, like using Docker to spin up your micro services, like that's also an option to run your test. Like, if you can't run in if you can't run your tests in production. But again, those are like super expensive and they're, they're just not as not as as good as testing in production.

Owen Gotimer

You mentioned the importance of potentially testing in staging if you have things that are needing to be GDPR compliant or other compliance things, maybe there's data that you need different types of data in. Are there ways though that you can test in production with maybe some alias data or some dummy data that will give you the same effect as if you were to do that in staging?

Talia Nassi

Yeah, yeah. So what you do is basically you create test users in production. And these users basically what, what I like to do is I like to have a Boolean in the back end, that is basically just a Boolean, that something like his test user, or is, is not real or whatever, whatever your your team decides, I like his test user. And I set that to true for these test users that I have in production. And then the data that gets created with these users while the test is running, is only only clickable by the test users. So what that means is like, I'll have a test user, I'll create some data within when I'm running the test. And then in the teardown of that test, I make sure to delete that data with this test user. And then if anything fails in that process, then I'll get an alert that says you know, something went wrong. You need to go check your time. Go check your data, see what went wrong. So I like to have test users in production. But you need to make sure that you get alerted if something happens in the setup or tear down so that just in case something goes wrong, and the data doesn't get deleted that like you need to, you need to be aware.

Owen Gotimer

Yeah, absolutely. And I think getting the notifications if something goes wrong is super important. Something that you've talked about and I've already talked about before is alerting tools and how they can be integrated into your testing and production strategy. Why are alerting tools so important to making sure that you're successfully testing and production?

Talia Nassi

Alerting tools are important to successfully test in production because you you need to know when something goes wrong, quite frankly, and that should be set up in every part of the test. Like I said, it should be setup for the test setup, the actual test and the and the tear down. And what's really important is that you need to set up the the alerting tool that can be integrated with your job scheduler. Because if you have, if you have, let's say Jenkins or a cron job, run your test every, I don't know, 10 minutes or every, that's a lot every hour or so. And then a test fails, you need to be immediately alerted. And another thing that I recommend here is to make sure that everyone on your team is on call for those alerts. So something I talked about when I talk about testing in production is that the whole team owns product quality, not just the tester, so it's really important that everyone is on call when something goes wrong. So there should be a rotation that where you and your product owner and the developers are all on call because you are responsible for the quality of this product.

Owen Gotimer

Whole team quality is something that I think is getting more and more traction, over the last however many years. We used to have silos. These were the developers, these were the business people, these were the testers, we kind of throw it over that wall to the testers, and they were responsible for the quality, taking a step back and said, 'Hey, the organization should be as a team responsible for the quality of the software." How difficult is it to shift that mindset? We talked about people, this is the way we've always done things, which is why they may be afraid to test in production, or there are also challenges around getting people to understand that quality should be a full team and a full organization expectation.

Talia Nassi

Yes, there are challenges with it, but like, like anything, like bringing up a new idea, I think as long as you just bring up why it's important and why you need To have everyone in in the in the rotation. So having you're having the entire team on on call is like one example of of the entire team owning quality. But another thing would be like having every team member go to every sprint planning and every sprint retro just so they can be involved in every step. Yeah, I think just like, entire team involvement is just like really key here because then everyone's aware of every step of the process. And then there's it just leads to no surprises, which is what we want.

Owen Gotimer

Yeah, absolutely. And I think having those conversations early and often with the entire team helps ease the tension and helps knock down those silos and help people better understand how important it is that the entire team is responsible for testing in and for quality. Another thing that I was thinking about in testing in production is what are some of the fears behind testing in production and how you might encourage people to get started testing in production. I think a lot of people listening to this who work in tactical positions, practitioners in testing and development and agile teams might think this is a great idea. Some of the fear might be coming down from business owners and from from stakeholders and the business itself. How do you recommend going about talking to the business about the value of testing in production from a business perspective? What kind of value should they expect? A lot of executives care about the dollars at the end of the day.

Talia Nassi

Yeah, this is great. So there's a chart online about when when you catch a defect, like the earlier you catch a defect, the less money it costs to fix it. And when you're testing in production, you catch the defect before you release the feature to production and that number one saves you money. The other thing is, you need to become confident that your features work in production. You, I don't care if my feature works in staging like staging is, it gives me a little bit of confidence. But that doesn't mean my feature is going to work in production. And as the business owner, I want to know that I'm going to release a feature and it's going to work perfectly. So not only is this going to increase your it's going to give your users a really great user experience which will increase your business with them it'll it'll give them a really great standard for like how for their for their respect level for you. Because you're not giving them like—can I curse in this?—you're not giving them like shitty software. You can edit that out if I'm not allowed to.

Owen Gotimer

I think it's great. It's it's real.

Talia Nassi

Yeah, you're delivering high quality software and that is going to increase your business no matter what. So at the end of the day, like you're increasing developer confidence because they know that their features are working You're increasing tester conference because they're testing in production and their tests are going to pass. And then once you turn on the feature flag, and everyone can see it, the users have a great user experience. So this is just going to increase your business flow, because everyone is competent. Everyone knows that your features are working, you know that if there's a bug, you can fix it before. So from a business perspective, it really is only going to do good.

Owen Gotimer

Yeah, absolutely. I think another thing that people maybe there's a little bit of fear around in testing in production is a lack of understanding of what testing in production is. I think sometimes when people hear testing in production, they think of like Chaos Monkey from Netflix or something along those lines. How do those two differ in terms of chaos engineering and testing in production?

Talia Nassi

Yeah, so the way I think of chaos engineering is you're basically doing things on purpose in production to like break your system. And then the way I think of testing and production is a way of rolling out your features, a way of rolling out your features to know that your features are working before or your users see them. And before. Yeah, basically just before your users see them. So they're two separate things. But testing production is also just it's not a replacement for all of your testing. Like, you can still test in staging, you can still, you can still write like unit tests in your code pipeline like this. This doesn't mean that you shouldn't test anywhere else. So yeah.

Owen Gotimer

How important do you think those other types of testing are? So you mentioned unit testing. Obviously, if you're looking at the traditional testing pyramid, that's a huge portion of the testing pyramid is the unit testing piece of it. How much of a role does having good unit tests impact your ability to test in production?

Talia Nassi

So I think they're all a little bit dependent on each other. So these, these tests that I'm advocating to run in production are UI end to end tests. So there shouldn't be, you know, 3000 UI tests running in production, there should be a lot less than that. But in terms of unit tests, if if, if a unit test fails, it's just as important as if a test in production fails. But the difference is that the tests in production when those failed, they're telling you that this user flow is not working in production. Whereas if a unit test fails, it would be it would be like, for us, maybe that specific component doesn't have an end to end test for it. So it just depends on like, your specific setup.

Owen Gotimer

Yes, yeah, totally makes sense. And I think that a lot of times the testing and production that you user facing side of it, obviously, is so important as is the backend side, because we want to make sure things are working properly on the backend. So, say someone listens to this podcast, and they're like, yeah, you know, testing in production is great. And they go and they tell their boss but they still have these naysayers who are saying that this is never going to work. Staging will never fully represent production. There's really no way to do it. How do you curtail those naysayers and get them to better understand the value of testing and production?

Talia Nassi

So I don't basically I would just say like, I don't care if my features work in staging, I do care if they work in production. And the only way to know if my code is working in production, if my features are working in production, is if I tested in production. And you know, there's there's always going to be these people who say that it doesn't work and you know, you can't test them production and blah, blah, blah, but like they're not my target audience like they're either going to get on board or they're going to die off like this is Such an innovative thing. And it's it really just proves the value of your software and it can only really only do good. So if you do it correctly and safely, it can be super beneficial.

 

Tags: 

User Comments

1 comment
Saransh Saxena's picture

Testing in production, didn't knew about that!

Thanks for sharing. 

October 1, 2020 - 2:03pm

Upcoming Events

Apr 28
Jun 02
Sep 22
Oct 13