Semantic Search Benchmarking via Synthetic Data Generation

A framework for generating synthetic Q&A datasets using Gemini AI to benchmark semantic search models across diverse domains — eliminating the need for manual evaluation set curation.

Python

Gemini AI

Semantic Search

Synthetic Data

Benchmarking

Code

Problem

Evaluating semantic search quality requires domain-specific (query, relevant document) pairs — expensive and slow to curate by hand.

Approach

Use Gemini AI to generate synthetic (query, answer, source document) triples from a given corpus, then run embedding models against retrieval metrics.

Architecture

Corpus → Gemini AI query/answer generation → embedding model evaluation pipeline → retrieval metrics (MRR, Recall@K, NDCG).

Results

Framework generates domain-specific benchmarks and evaluates multiple semantic search models comparatively.

Lessons learned

Synthetic data from capable LLMs is a fast, scalable alternative to manual eval set construction — especially effective for domain-specific retrieval benchmarking.

Fed-BLEND — Federated Conformal Prediction for VLMs

Novel federated conformal prediction method that mitigates hallucinations in federated fine-tuned vision-language models via abstention.

GCTAF — Time-Series Forecasting & Flare Risk Classification

Attention-based forecasting of magnetic field trajectories combined with supervised contrastive learning for solar flare classification.