LLM Evaluator (Model Response Analyst)

Job Overview

Location
Lagos
Job Type
Full Time
Date Posted
9 hours ago

Job Description

Job Title: LLM Evaluator (Model Response Analyst)
Location: Remote (Worldwide)
Job Summary: We are seeking a detail-oriented and analytical LLM Evaluator to assess, analyze, and improve the performance of large language models (LLMs). In this role, you will evaluate AI-generated content for accuracy, coherence, factual reliability, bias, safety, and alignment with defined guidelines.

Responsibilities:
• Evaluate and rank model-generated text based on complex rubrics covering dimensions such as factuality, coherence, safety, instruction- following, and creativity.
• Review multiple model responses to the same prompt and determine which output a human would prefer, providing justifications for your choices.
• Provide clear, concise feedback to the modeling and training teams regarding recurring failure models observed during evaluation sessions.
• Attempt to break the model by crafting prompts designed to elicit biased, harmful, or insecure outputs to help patch safety vulnerabilities.
• Collaborate with the quality assurance team to suggest improvements to evaluation guidelines when you encounter ambiguous or unclassifiable edge cases.
• Participate in regular cross-checking sessions with other evaluators to calibrate scoring standards and ensure inter-rater reliability across the global team.
• When a model underperforms, dig deeper than the surface score to hypothesize why the model made a specific error (e.g., training data vs. prompt misinterpretation).
• Identify and flag novel or unexpected model behaviors to the research team, contributing to a living library of unique model outputs and failure modes.

Requirements:
• Minimum of 5 years of professional experience in a relevant field such as; Computational Linguistics, Data Analysis, Technical Writing, Quality Assurance (specifically for NLP/AI), or cognitive science.
• Bachelor’s degree in Computer Science, or a relating field.
• Deep understanding of how-to craft prompts to elicit specific behaviors and test model limits.
• Ability to look at a text output and explain why it is good or bad based on logic, tone, factuality, and instruction adherence.
• Experience working with Reinforcement Learning from Human Feedback (RLHF) data collection.
• Proven experience monitoring and improving consistency among evaluation teams. Ability to analyze IAA scores and conduct calibration sessions to align judgement.
• Experience sourcing, cleaning, and annotating datasets specifically for the fine-tuning or evaluating LLMs. Understanding of data distribution and its impact on model performance.
• Familiarity with A/B testing concepts applied to AI. Ability to help design experiments to test if a new model version is truly better than the previous one.

Method of Application:
Qualified candidates should send a copy of their cv and portfolio to with the job title as the subject of the mail.

Salary:
$1,000,000 monthly

Similar Jobs

Full Time
Full Time

Geriatric Near Nashville, TN

Atlantic MEDsearch

Full Time

Ob/Gyn Near Wichita Falls, TX

Atlantic MEDsearch

Full Time

"Inspire Global Solutions"


We "Inspire global solutions" provide solutions in determining your requirements and career needs that you dream for ever. A clear vision and a power of professional hands will give you platform to up hold your professional career.

Connect with us


© 2018-2026 Inspire Global Solutions, All right reserved
 
image