
Research
What is an "ideal" response to a prompt? How do you quantify how "ideal" it is? Success depended not only on whether a pipeline worked, but on the strategy used to build it and the context provided to the user.
To focus the PoC, we tested assumptions around user expectations of AI capability, acceptable failure modes, and differences between less-technical users and power users. Engineers built a constrained MVP model and we tested it with internal users across experience levels. We delivered instructions as broad, multi-step goals based on real use cases from our User Support and Enablement Slack channels.


Key Refinements for Success
Research findings allowed us to prioritize four areas that significantly improved model training and output quality:
Allow users to manually switch between Build and Ask modes. Providing the user explicit control increased trust and confidence, helping mitigate moments when model output underperformed.


Outcomes
Both internal and GA releases of the Copilot were massive successes, both for the users and the business. A majority of users adopted the feature within 2 weeks, with internal support staff and sales associates leveraging it daily for 80% of their tasks.
