Re-architecture of Semantic Candidate Search

The Problem Statement

It was a classic startup scene when I first walked into Hollyhires' office – three founders, a humble desk, and a demo that worked just well enough to show potential. As an AI Engineering consultant, I was brought in to solve what had become their critical technical bottleneck: their candidate search system.

"We want recruiters to be able to find candidates as intuitively as having a conversation," the CEO explained during my first meeting. "Type in 'Python wizard who loves building scaling distributed systems from Seattle' and get exactly that person." The vision was compelling – using the latest in LLMs to understand the deep semantics of both job descriptions and candidate profiles.

But the current implementation was holding them back. Their MVP was essentially a patchwork solution: LinkedIn API calls stitched together with GPT, analyzing candidates one by one against job descriptions. For each search, the system would fetch candidates in small batches, run them through GPT for matching, and if it didn't find a good enough match, fetch another batch. This needle in the haystack approach meant searches could take upwards of 30 minutes – an eternity in the fast-paced world of recruitment, and this is if it returned an effective search at all.

It was very clear that this would not scale. Or be usable.

The Ask

I had 6 weeks to completely rearchitect their search infrastructure. The objectives were clear:

Own the data: Move away from real-time API calls to a proper data infrastructure
Make it fast: Bring search times down to seconds, not minutes
Keep it semantic: Maintain the deep understanding that made their vision special
Make it scale: Design for millions of profiles, not just demos

Data Infrastructure

The first most critical pivot I suggested was moving from real-time LinkedIn API calls to owning our data infrastructure. We acquired a massive dataset of 200 million LinkedIn profiles, but raw data alone wasn't enough – we needed to transform this into a structured, queryable format that would serve as the foundation for our semantic search engine.

The Scale Challenge

Processing 200 million profiles isn't just a matter of running a simple script. Each profile was rich with information: work experience, skills, education, recommendations, location. We needed a distributed computing approach that could handle both the initial transformation and ongoing updates efficiently.

We designed our ETL (Extract, Transform, Load) pipeline using Dask, a flexible parallel computing library that would allow us to scale our data processing horizontally. Coiled provided the cloud infrastructure, automatically spinning up and down compute resources as needed. This combination gave us the perfect balance of power and cost-efficiency.

The pipeline had several key stages:

Feature Extraction: Computing and storing derived features for our search engine
Transformation: Converting profiles into our custom data models where we created schemas optimized for our specific use case.
Quality Assurance: Ensuring data consistency and completeness where we ensure the relevant fields are present and in the correct format.
Data Ingestion: Ingesting the data into our own infrastructure on ElasticSearch.

These models were designed to:

Support fast filtering operations (e.g., years of experience, current location)
Store pre-computed features for our ranking system
Maintain relationships between different aspects of a candidate's profile
Enable efficient updates as profiles evolved

Vector Embeddings for Search

Building the Semantic Brain

After establishing our data foundation, the next challenge was creating a truly semantic search experience. The goal was ambitious: when a recruiter searches for a "full-stack developer with experience in scalable systems," we needed to find matches that understood the intent, not just keyword matches.

Choosing Our Embedding Model

The choice of embedding model would make or break our search quality. We needed a model that could capture the subtle nuances in both job descriptions and candidate profiles. After extensive research and experimentation with various Sentence Transformer models from Hugging Face, we landed on the multilingual-e5-large-instruct model.

This wasn't a random choice. The model had demonstrated superior performance on the Massive Text Embedding Benchmark (MTEB) leaderboard, particularly excelling in retrieval and reranking tasks – exactly what we needed for candidate search. Its multilingual capabilities were an added bonus, allowing us to potentially expand to multiple markets in the future.

Engineering the Vector Space

We made the critical decision to embed the entirety of a candidate's work experience and education history. This meant our embeddings would capture:

Career progression and growth
Skills demonstrated across different roles
Educational background and its relevance to their career
Project experiences and achievements

Elasticsearch

With our embedding strategy defined, we needed a scalable way to perform similarity searches across millions of vectors. Elasticsearch provided the perfect foundation:

Efficient vector storage and retrieval
Support for hybrid searching (combining semantic and keyword-based approaches)
Built-in support for scaling and sharding
Fast approximate nearest neighbor search

The Art of Feature Engineering

While our vector embeddings provided a strong semantic foundation, we quickly realized that great candidate search needs both precision and nuance. Through close collaboration with recruiters and analyzing successful placements, we developed a two-stage feature system which I called Hard Cut, Soft Rank.

Hard Cut: Precise Filtering

Hard cuts represented our filterable, structured data - the binary or quantifiable aspects of a candidate's profile. Years of experience was a clear one, location preferences were another. We also structured technology requirements this way - if a role absolutely required Kubernetes experience, it could be filtered. These hard cuts included current job title, company size, education level, and specific certifications - all data points that could be exactly matched rather than semantically interpreted.

Soft Rank: Semantic Understanding

The soft ranking features were where the semantic magic happened. These were the rich text fields we converted into embeddings - the crux of a candidate's profile that needed contextual understanding. A candidate's work experience descriptions revealed how they approached technical challenges and what they prioritized in their work. Project highlights often contained subtle indicators of their technical depth and problem-solving approach. Even their educational background, beyond just the degree title, gave insights into their academic interests and specializations.

The beauty of this approach was that it mirrored how recruiters actually think - first filtering on non-negotiable requirements, then diving deep into understanding candidate experiences and capabilities. By structuring our features this way, we could efficiently narrow down the candidate pool before applying our more computationally intensive semantic matching.

The Final Touch: Vetting

While our search system was now surfacing relevant candidates at scale, we added one final layer of intelligence: our Vetting Agent. This LLM-powered system acted as a discerning recruiter, carefully examining each candidate in our search results against the specific job requirements.

For each potential match, the Vetting Agent performed a detailed analysis, producing a confidence score out of 10 and what we playfully called "recsplanations" - concise, compelling explanations of why each candidate would be a great fit. Instead of recruiters having to piece together the narrative from scattered profile information, they'd see something like: "9/10 - Senior engineer with proven experience scaling distributed systems at Uber, led multiple high-impact projects using Kubernetes and microservices architecture, demonstrated history of mentoring junior engineers."

These recsplanations weren't just summaries - they were contextual insights that highlighted the most relevant aspects of a candidate's experience for each specific role. This final layer of intelligence helped recruiters quickly understand not just who matched, but why they matched, to create that little bit of product magic.

30m to 7s

The transformation of the candidate search system was dramatic. What once took over 30 minutes now returned results in just 7 seconds. But speed wasn't the only improvement - we had built a truly intelligent search engine that understood the nuances of technical recruitment.

The impact was immediate and measurable. More importantly, the quality of matches improved significantly. The combination of our vector embeddings, carefully engineered features, and AI-powered vetting meant that recruiters were seeing better candidates faster.

For a young GenAI startup with ambitious goals, we had built something that didn't just work - it worked at scale. It's important to do things the right way, the slightly more deliberate way, instead of simply buying into the Gen AI hype with a wrapper around GPT app.

The six-week journey from a simple GPT-powered prototype to a production-ready semantic search engine had been intense, but the results spoke for themselves.

I look back at this challenging sprint and see how incredible it was to see the fruits of my labour.

MEGHANA KUMAR