Close

Inside Look: Consumer Reports’ AI-Powered Chatbot

By Steven Melendez |  October 11, 2024
LinkedInTwitterFacebookEmail

In June, Consumer Reports debuted AskCR, an AI-powered chatbot that can answer subscribers’ questions based on the nonprofit organization’s research into a wide array of consumer products. VP of Innovation Ben Moskowitz and Ginny Fahs, the Director for Product R&D in the Consumer Reports Innovation Lab, spoke about the process of developing AskCR; its ongoing rollout to subscribers in beta; and how Consumer Reports ensures the chatbot upholds its nine-decade long track record for accuracy and trustworthiness.

Our Roles

Ben Moskowitz: I’m the VP of innovation, and that means I work with our strategy team to think through how the enterprise needs to adapt and evolve.

Ginny Fahs: I work on the Innovation Team at CR. I lead product innovation — product and research and development, so I’ve been scoping and delivering the AI-powered experience of AskCR.

The Project

Consumer Reports VP of Innovation Ben Moskowitz

BM: So at the highest level, we’re in the information business. We’re a nonprofit that’s been advising consumers going back to the 1930s on what to buy—what’s safe to buy, what’s the right product for me. We’ve got 6 million members who turn to Consumer Reports magazine and its online counterpart to decide what to buy and get information.

But there’s a lot of pressure on that model, because there’s a lot of information out there, and we are dependent on search and other upstream information gateways to get people to our front door. And one of the concerns with AI is that when people go searching for things, they’re not going to get those 10 blue links anymore. It’s going to be more of a conversation.

So both as a defensive matter and also to innovate and have an experience that meets modern expectations, we thought it was very important to figure out how to give an AI experience to our members, so you can come talk to us and not just search through our database. It’s like having a friendly expert who you can ask a question in plain language.

GF: From a product perspective, AskCR is meant to help you decide what to buy, and deciding what to buy has gotten harder and harder with all the fake reviews and pay-to-play content. Something special we offer at Consumer Reports is that we have testers in lab coats in Yonkers testing products every day. We have an ability to be objective about delivering that objectivity directly, using only our trusted information.

How It Got Green-Lit

BM: We’re really fortunate in that we have an established team and capabilities — there’s a commitment already in place. This started with a cross-functional strategy assessment. I know it sounds like a big, fancy set of words, but it began following the launch of ChatGPT and a lot of people going “wow!”

With a 30-person internal working group, we looked at all the ways we engage, how we serve consumers, what our work was internally, and what we thought AI could do to improve either the products or the way we work. Coming out of that was a pretty strong conviction we needed to be investing in AI product experiences but also AI competency.

So we did a number of things before we took the big leap into launching AskCR. Across some of our websites, there’s lots of places where we take our content and summarize it, and we thought that was a low risk way to dip our toes in the water, because it matched well with the capabilities of the models, and we could learn by shipping features.

The first thing we shipped was summarizing our product reviews to little snippets. We also worked to get access to ChatGPT Enterprise and similar tools for our staff, just to build skills and knowledge and build the dialogue internally about the technology, its pros and cons, what it could do for us, and what are the risks. And through that, we started honing a pretty clear idea of a product we could launch, which actually is a product that we resurrected.

Ginny Fahs, Director for Product R&D, Consumer Reports Innovation Lab

GF: AskCR is very much inspired by a product that CR offered a few years ago. In that version of AskCR, we had trained a number of individuals in call centers on CR content, and our members would be able to ask a question to somebody specially trained on CR’s knowledge and have a conversation about various criteria for what they were wanting to buy and whether this model or that model stacked up.

The experience was wildly popular. Members loved it, with some of the highest satisfaction from any experience on our site. But ultimately we had to sunset it, because it was just too expensive to have actual humans paid 24/7 to answer questions. It just wasn’t financially sustainable.

But we knew the experience was something our members really enjoyed and appreciated, and then the explosion of ChatGPT onto the scene and all the attention to conversational AI made it seem like an opportune moment to revisit that concept and see if we could deliver it in a different way.

The Three Biggest Challenges

BM: One, in the context of our organization, was balancing the speed of building something new and independently with the kind of integration that enables something to be fully embedded in all the enterprise represents. This is probably a common pattern in a lot of medium and bigger organizations where there’s a cost to doing things, and things go slower than they would in a startup or a smaller team. You want the benefits of moving fast, but you don’t want to move so fast that you separate from the mothership. A lot of cross-functional work was required across every dimension of the product, from the data pipelines to the IT choices to the product choices to how it connects with the existing experience and, legally speaking, the terms of service. In every dimension of execution you can imagine, there was a balance between moving fast and trying to build some momentum and building within an existing structure.

The second challenge is that the capability building is as important as the deliverable you get through a process like this. So finding the right vendors, because we did lean on some vendor support, and bringing in the right full-time staff. We did a bunch of recruiting to support this project, and we probably will continue. And really ensuring that everyone knows it’s not just what we’re doing, it’s how we’re doing it. It’s the lessons we’re learning, it’s the muscles we’re building, and the culture we’re modeling as we do this work. Ginny and I both come from a software engineering background, so we’re trying to bring the culture of software engineering into the organization. I would roughly call it the capability building challenge, but it’s also the thing that’s most rewarding to me.

And third is really measuring ROI and knowing how to measure different kinds of ROI over different horizons. We’re hoping it adds value right away to our members, and anytime we provide more value to our members that should benefit the business. Medium term, we think there’s lots of interesting new product opportunities, maybe some new business models. And then longer term, if we don’t really master these technologies, we might not be in business, right?

We need to help all of our stakeholders to understand what ROI looks like and when to be impatient and when to be patient. And there’s some things we need to be really, really impatient about, right? And then there’s some things that are going to take a bit more time.

GF: What may not be apparent is just how much staff engagement was part of the process of bringing this to life. Because again, the vision is that AskCR is a friendly expert. It’s a way to feel like you have on speed dial the person sitting in the lab in Yonkers who spent the past 40 years testing infant car seats or doing laundry with different detergents or whatever it is. So we really needed to make sure that the voice of our 120 experts shone through the product.

In order to do that, we had individual experts spending time with our team, talking to AskCR, evaluating responses against what they would have said in answer to the question, helping us tweak and refine different aspects of the [retrieval augmented generation] system so that when we delivered an answer, it was imbued with everything that they know and think in addition to what’s present in our databases about a product. That staff engagement is something that may not be apparent on the surface, but that is part of what distinguishes this from talking to ChatGPT about what product to buy.

A screenshot of the AskCR beta.

The Smartest Thing We Did to Set It Up for a Successful Launch

BM: The smartest thing we did was set an expectation that it will be rolling out continuously. If you don’t have exposure to the light of day, you’ll never know if what you have is good. At the same time we wanted to be responsible in how we introduced this for a lot of reasons, not least because people think of CR as a very trustworthy kind of broker. So I think it’s smart both to ensure we get a better product and to make sure that we are getting the value from the investment we hope.

We built the commitment that we were going to continue at minimum shipping an update every two weeks. So if you came back every two weeks, it’d be a steady current improvement. And that took a lot of psychic energy. When you’re an organization that wants things to be perfect and, in fact, when you’re organization that for a living, takes things apart and kind of critiques them—that washer could have a better spin cycle and so on—if you put out something that’s not perfect, that’s really stressful.

Very deliberately, we did a phased roll-out, where it’s clearly a beta. We initially had an internal beta, then it went to one percent of members…

We had to teach ourselves, and set expectations for others, that part of making it better is getting it through that process of being just okay, and that also gets to how we’ve chosen to roll it out. We didn’t just launch it. Very deliberately, we did a phased roll-out, where it’s clearly a beta. We initially had an internal beta, then it went to one percent of members, and we’re steadily growing the number of people so we can metabolize the feedback.

GF: My answer to the question would be involving our own experts in the development process. At the end of the day, the only way AskCR is going to be successful is if we are noticeably better than ChatGPT’s answers every time, and the way we do that is as much about our experts and what they’ve learned over the decades as it is the style of our delivery of the information..

A screenshot of the AskCR beta.

Metrics We’re Tracking

GF: We decided to empirically measure “good enough for us.” We have a blog post about our evaluation process for AskCR and what we call evaluation-driven development. Any change we make we run through an evaluation suite or a test suite. We see the answers that come back. We evaluate whether those answers have improved since last time, stayed the same, or gotten worse. And there’s a threshold that every new release has to pass in order to be released, and we want to see it getting better every time.

We found we really need human discernment to help us understand both how the quality is improving and how it’s not…

A lot of this is human review. It’s not entirely automated, because there are a lot of judgment calls. We found we really need human discernment to help us understand both how the quality is improving and how it’s not, so that we can very specifically design features around the places we know need work.

And we care a lot about stickiness. If someone uses AskCR once, we want them to be using it again for their next purchase. And so we’re very carefully tracking repeat use because that habituation is part of how we prove to our members that this is getting better and better with time, and how we prove that what we’re building is valuable and creating a new type of relationship between the member and the organization.

LinkedInTwitterFacebookEmail