[HR Tech] AI-Based Talent Sourcing: Finding the Right Candidates with Sentence-BERT

Introduction

As competition in the hiring market intensifies, it has become increasingly important for companies to find the right candidates quickly and efficiently. Traditional recruitment methods not only consume a lot of time and resources but are also prone to errors. The advancement of artificial intelligence (AI) offers new possibilities to address these challenges. Specifically, AI-based talent sourcing using natural language processing (NLP) models like SBERT (Sentence-BERT) is bringing innovative changes to the recruitment process.

SBERT can accurately analyze the similarity between resumes and job descriptions, allowing companies to recommend the most suitable candidates for the positions they need to fill. In this article, we’ll explore how SBERT can be used in recruitment processes through a simple example.

What is Natural Language Processing (NLP)?

Natural Language Processing (NLP) is a technology that allows computers to understand and process human language. NLP includes converting text or speech data into a format that machines can comprehend and analyzing the meaning of language to apply it in various applications. Examples of NLP applications include document classification, sentiment analysis, and machine translation. In talent sourcing, NLP plays a critical role by analyzing textual data such as resumes or job descriptions to identify suitable candidates.

What is Embedding?

Embedding is a technique used to convert natural language data into high-dimensional vectors. While human language is highly complex and varied, computers process numerical data much more effectively. To address this, text is transformed into high-dimensional vectors that preserve the meaning of the words. These vectors enable the calculation of semantic similarity between words, ensuring that words with similar meanings are positioned close to each other in the vector space.

Cosine Similarity

Cosine similarity is a method for measuring the similarity between two vectors by determining how closely their directions align. The magnitude of the vectors is ignored, and only the angle between the vectors is considered. The closer the angle is to 0 degrees, the higher the similarity, and the closer it is to 90 degrees, the lower the similarity.

Cosine similarity is typically calculated using the following formula:

\text{cosine similarity} ={{A⋅B}\over{∥A∥∥B∥}}

•

A and B are the two vectors being compared.

•

A ⋅ B is the inner product of the two vectors.

•

∥A∥ and ∥B∥ represent the magnitude (L2 norm) of the vectors A and B, respectively.

SBERT (Sentence-BERT)

SBERT is a model based on BERT that generates sentence-level embeddings, allowing for faster and more efficient similarity calculations between sentences. While BERT excels at capturing word-level semantics, measuring similarity or comparing sentences requires multiple computations, which can slow down processing speed.

In contrast, SBERT leverages the core architecture of BERT while converting sentence inputs into fixed-size vectors. This allows each sentence to be represented by a unique vector, and these vectors can be used to quickly and effectively calculate similarity between sentences. SBERT is optimized for generating sentence embeddings, making it ideal for tasks like text search and document classification, where sentence-level similarity needs to be calculated quickly.

SBERT performs exceptionally well in similarity-based recommendation systems and talent sourcing by quickly and accurately calculating similarity between sentences.

Code Example: Talent Sourcing with BERT

Next, let’s look at a simple Python code example that uses the SBERT model to calculate the similarity between a resume and a job description. This code uses the sentence-transformers library to load the SBERT model, convert the resume and job description into embedding vectors, and then calculate the cosine similarity between the two vectors.

Step 1: Load Required Libraries

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import torc
Python
복사

•

SentenceTransformer is a library that enables the use of the SBERT model.

•

cosine_similarity is used to calculate the similarity between two vectors

•

torch is the library required when using PyTorch.

Step 2: Load the Model

model = SentenceTransformer('bert-base-multilingual-cased')
Python
복사

•

The bert-base-multilingual-cased model is a multilingual BERT model capable of processing multiple languages, including Korean.

Step 3: Define the Job Description and Resumes

job_description_text = "This role involves performing data analysis and supporting business decisions by analyzing customer data."
resumes = [
    "I have experience optimizing business performance through data analysis.",
    "I analyzed customer data and developed marketing strategies that led to increased sales.",
    "I have expertise in programming and data analysis and can interpret complex data."
]
Python
복사

Step 4: Generate Text Embeddings

job_description_embedding = model.encode(job_description_text)
resume_embeddings = model.encode(resumes)
Python
복사

•

The model.encode() function converts text into embedding vectors.

•

job_description_embedding is the embedding vector for the job description, while resume_embeddings are the embedding vectors for the resumes.

Step 5: Calculate Cosine Similarity

cosine_similarities = [cosine_similarity([job_description_embedding], [resume_embedding])[0][0]
                       for resume_embedding in resume_embeddings
Python
복사

•

The cosine_similarity function calculates the similarity between two vectors.

•

We compute the similarity between the job description and each resume, storing the results in the cosine_similarities list.

Step 6: Rank Resumes by Similarity

ranked_resumes = sorted(zip(cosine_similarities, resumes), reverse=True, key=lambda x: x[0])

for rank, (similarity, resume) in enumerate(ranked_resumes, start=1):
    print(f"Rank {rank}: Similarity {similarity:.4f} - {resume}"
Python
복사

•

zip is used to pair the similarities with the resumes, and sorted orders them in descending order based on similarity.

•

enumerate is used to print the rank and similarity for each resume.

Sample Output

Rank 1: Similarity 0.8234 - I have experience optimizing business performance through data analysis.
Rank 2: Similarity 0.7625 - I analyzed customer data and developed marketing strategies that led to increased sales.
Rank 3: Similarity 0.6547 - I have expertise in programming and data analysis and can interpret complex data.
Shell
복사

Conclusion

Using AI for talent sourcing offers much more refined and efficient results compared to traditional methods. With the help of natural language processing (NLP) technology, companies can perform customized talent searches, enabling them to find the right candidates quickly and accurately.

Experience AI-based talent sourcing solutions firsthand with TalentSeeker.

This version has been adjusted to maintain the technical accuracy while providing a more fluid and natural flow for an English-speaking audience. Let me know if you'd like further adjustments!

Learn more about TalentSeeker

References

•

Reimers, N. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint arXiv:1908.10084.

•

Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

•

Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems.

•

Mikolov, T. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 3781.