R in Production, DevOps, and the Importance of Empathetic Witnesses

Kelly O'Briant
9 min readNov 3, 2020

--

TLDR; eRum 2020 talks are available on YouTube!

Thanks to the hard work of an impressive and dedicated group of organizers, eRum transitioned to a virtual conference this year.

At the time (last June), I had recently stepped into a new role at RStudio, product manager for the Connect team. I was grappling with how to take my solutions engineering identity and apply it to product management. The rambling narrative I put together touches on all the things I’d been thinking about since joining RStudio: R in Production, Analytic Administrators, and DevOps Philosophy. But most of all, creating this talk helped me identify the biggest thread I’ve pulled through from Sol-Eng to PM — the importance of being an empathetic witness.

This is the long-form text version of my talk which can be watched on the eRum YouTube channel here.

What is Solutions Engineering?

At RStudio, Solutions Engineers wear a bunch of different hats. A big part of my work involved helping data scientists and administrators at other organizations get unblocked.

We have a growing user base in a large number of data-heavy industries who all have slightly different goals and challenges. When I talk to users, I get to ask a lot of really interesting questions, then I help synthesize the responses in such a way that we can provide tooling, advice, and solutions.

I interact with organizations in all stages of “R in Production” maturity. Some teams are still trying to legitimize R and have it be recognized as a supported part of their official software stack. Some teams are grappling with their IT groups, working through resistance as they try to up-skill. Other teams are fully bought in — they’re ready to integrate R across many parts of the organization, but face unique and interesting challenges which stem from an expanding domain.

R Admins

At the heart of all this activity is often one person. We like to refer to them as the R Admin, but a more general term might be “Analytic Administrator”.

The R Admin is someone who is invested in continually improving analytic infrastructure, advocates for best practices in data product deployments, and acts to adopt DevOps philosophies in their organization.

This person is trying to unblock their team, and they’re often taking on the burden of doing this alone.

Times I’ve felt alone

There have been a couple times throughout my career where I’ve felt entirely alone, trying to advocate for the tools and best practices.

Before RStudio, I worked for a large tech company, but in a role where I had very little access. And yet I had dreams of making a real impact there. I created some changes I was proud of, but each tiny success made me want more. Eventually I turned into a huge menace — testing any boundary I could find. The infrastructure I wanted for my team was available, and yet I wasn’t allowed to have it. I wasn’t trusted with it. And that drove me crazy.

So I decided to never be in that position again and I joined a tiny startup. I thought there I would have all the access and influence I wanted. It was so small my ideas couldn’t be ignored! And I was wrong again. I never had time to complete last week’s daily work, let alone improve today’s. It was exciting at times, and utterly soul crushing at others.

The Solutions Engineering team at RStudio is where I found my community.

It sounds cheesy — but it’s true. This is what I say when people ask me what it’s like to work at RStudio. As a solutions engineer I got to help people just like me navigate the ins and outs of making an impact at their organizations through the use of software built for data science. And it’s been amazing. But also sometimes frustrating. Because I know how hard it is for those R Admins who feel like there’s no clear path to accomplishing what needs to be done.

The Unicorn Project

I read a novel last year that really captured the visceral feeling I’m talking about. In this scene, the main character, Maxine, a developer, is trying to improve how development work gets done on her project in a crucial, critical way. But everything she tries fails, progress seems impossible and she’s at her wits end.

I think what’s being described here is trauma. I found this particular flavor of work-trauma unfortunately relatable. But my time at RStudio has given me space to reflect on ways to combat it. I like this quote from Benjamin Hardy

If you’re going to create a powerful future, you’ll experience failures, heartaches, bad days, and pain along the way. You need a team of empathetic witnesses. You need people to encourage you to keep going — to encourage your work when others don’t understand.

As a solutions engineer, that’s what I’ve tried to do for other R Admins, folks on the R in Production journey. If all I’ve accomplished over the last two years is to offer a few people the encouragement they needed in a tough time to keep going, that’s pretty cool. But today I want to offer you some things to consider along the way as well.

The Five Ideals

I didn’t just bring up that Unicorn Project book as a random aside. I want to actually talk about it a little and what it could potentially mean for us in the data science community. I mentioned that it’s a novel, which is kind of silly. But It’s also a really cathartic story. My colleague Alex and I like to joke that it’s like a trashy romance novel for developers, but without the romance.

I don’t think I’ll ruin the book by telling you — the main character ends up finding a team of empathetic witnesses. Together they do great things, and discover these five ideals that help them achieve the kind of digital transformation their organization needs so desperately.

Gene Kim on Twitter

I’ve found that most R users don’t spend a lot of time thinking about DevOps. But the more I learn about it as a philosophy and discipline, the more I’m compelled to socialize it. So I want to introduce you to “The First Ideal”, and try to convince you that all these concepts may be handy when navigating your R in Production goals.

The First Ideal

The first ideal is all about the extent to which a team or individual can independently develop, test, and deploy value to customers. In an ideal world, any person or team can work in their own area and get everything they need to do, done. In a non-ideal scenario, this is the opposite. A person has to coordinate with many teams and people to get any one thing accomplished.

Could data scientists do this? Is this how we think about running R in production?

Challenges for the R User

Answering these questions involves defining the challenges that data scientists face, which I tend to classify into two buckets: Organizational, and Technical.

Organizational challenges tend to be communication and bureaucracy-based. Do you have the type of organizational culture that will enable you to make R a legitimate part of your tech stack?

Technical challenges involve identifying the shortcomings of your development process. What needs to change or evolve in order to satisfy your definition of production?

Analytic Administrators lead the way on solving organizational & technical challenges.

Here’s how — First they define the problem: Data scientists aren’t trusted to independently develop, test, and deploy value to customers and stakeholders by putting data products into production.

But maybe we should take even one more step back — Can we define Production?

Using this definition of production, I argue that data scientists are often in the position of “putting R into production”. But you should decide for yourself, and watch Joe Cheng’s 2019 Keynote here.

For a data scientist to live by the First Ideal of locality and simplicity, they need to show that there are acceptable answers to all of these concerns:

What patterns can data scientists use to increase the reliability of their own work?

One of the real gems out of Joe’s 2019 Keynote address was this very simple performance workflow for increasing confidence in a production application.

Many prototype applications fall over in production because they contain performance anti-patterns. These tools — shinyloadtest and profvis are R packages that can be used independently by a data scientist to test, debug, and hopefully demonstrate a “Keep it Snappy” experience for users.

“Keep it Correct” involves the use of tools like shinytest, version control, and change management strategies.

In my experience, “Keep it Safe” & “Keep it Up” involve the upfront communication and coordination in order to achieve Locality and Simplicity.

In order to act independently, Analytic Administrators need to build trust with IT groups.

Developing trust is challenging! First you need to find someone with the capacity to have empathy for you. Then you need to have even more empathy for them!

Analytic Administrators can work through these trust development challenges using the following DevOps principles:

  • Make work visible
  • Define shared goals
  • Experimentation
  • Iteration (continual improvement)

I recommend starting with some of my favorite shared goals:

I already mentioned some of the open source tools and R packages you can use for testing data products, so let’s talk about sandboxes and how they can be used to shorten the distance between development and production.

I like to write about my own experiences creating sandbox learning environments for free. Two of my favorite open source tools for doing this are Vagrant and VirtualBox — which make it really easy to play around with pre-built virtual machines.

Another fun trick to reducing the risk of deploying breaking changes is to decouple the idea of doing a deployment from a release. Deployment on demand and thoughtful release strategies allow more control (and hopefully more success) over the delivery of features to end users.

Some (or all) of these practices might not work for you or your organization. Experimentation and iteration are crucial components of this process. My goal as a solutions engineer has always been to share my own experiences and thoughts, but never to tell someone exactly what to do. Your situation is unique, and navigating through your own challenges will be difficult, but I think it’s worth it.

I’ve seen many organizations achieve Locality and Simplicity to deliver value through independent data science. Those teams never stop iterating, experimenting and improving. It brings me joy to witness.

— -

There are so many cool things to learn and incorporate into data science production work from DevOps. I’ve only had time to cover a handful here.

Here is my list of favorite books on DevOps: More Books!

If you too are interested in any of these topics, please join the R Admins community — become an empathetic witness for someone else.

--

--