Project Background

This project showcases work done on designing Coalesce’s flagship AI Copilot from scratch. The new feature allows users to build SQL pipelines using natural language prompts, opening the product up to a much larger, less-technical market while dramatically accelerating the work of power users.

The Challenge

Coalesce already simplifies the process of building SQL pipelines by providing a node-based drag-and-drop interface. However, a handy UI on its own doesn’t instill the mindset and strategy required to produce effective, efficient pipelines.
‍

A copilot that can translate natural language into complex actions, trained on Coalesce-specific best practices, can help guide less-technical users through their first Coalesce pipelines while simultaneously teaching them how to expand their skills and make increasing use of the tool.

Research

What is an "ideal" response to a prompt? How do you quantify how "ideal" it is? Success depended not only on whether a pipeline worked, but on the strategy used to build it and the context provided to the user.
‍
To focus the PoC, we tested assumptions around user expectations of AI capability, acceptable failure modes, and differences between less-technical users and power users. Engineers built a constrained MVP model and we tested it with internal users across experience levels. We delivered instructions as broad, multi-step goals based on real use cases from our User Support and Enablement Slack channels.
‍

To quantify performance without over-indexing on raw metrics, participants assigned letter grades (A+ to F) to the Copilot’s overall performance, along with the minimum grade they would consider acceptable for a GA release, and adjustments required to reach an A. These grades were paired with qualitative feedback and comparisons to other AI tools participants used in their daily work (e.g., ChatGPT, Gemini, Claude).
‍
After iterative refinement, the same evaluation framework was applied with real customers in our Product Preview group to validate findings prior to general availability.
‍

Key Refinements for Success

Research findings allowed us to prioritize four areas that significantly improved model training and output quality:
Allow users to manually switch between Build and Ask modes. Providing the user explicit control increased trust and confidence, helping mitigate moments when model output underperformed.
‍

Keep language non-technical, even when the underlying solution is complex. Users who want deeper technical detail reliably ask for it, while simpler explanations improve comprehension and adoption for everyone else.
Use explanations to accelerate learning. Informative reasoning helped less-technical users achieve higher-quality results within just a few prompts.
‍

Optimize differently for power users. While less-technical users adopted faster, power users trusted the Copilot more when it minimized explanation and executed tasks using familiar best practices.

Outcomes

Both internal and GA releases of the Copilot were massive successes, both for the users and the business. A majority of users adopted the feature within 2 weeks, with internal support staff and sales associates leveraging it daily for 80% of their tasks.
‍

In post-release testing, the GA release ultimately received a B+, a figure gathered from both qualitative interviews and quantitative analysis of user activity using the thumbs-up/thumbs-down feedback feature.

← Back To Case Studies

Interested in learning more about this project?

Send Message LinkedIn