Building Model Context Protocol (MCP) Clients for Bioinformatics APIs
Published on January 20, 2026
The intersection of AI and bioinformatics is creating unprecedented opportunities for data-driven research. In this post, I'll share my experience building an MCP (Model Context Protocol) client that integrates with the NCBI Entrez API to retrieve structured gene and protein metadata—a project that bridges modern AI tooling with life sciences data.
What is Model Context Protocol (MCP)?
MCP is an emerging standard for enabling AI models to interact with external tools and data sources in a structured, reliable way. Unlike ad-hoc API integrations, MCP provides a standardized JSON-RPC interface that allows language models to query databases, retrieve files, and execute functions with proper error handling and rate limiting.
For bioinformatics applications, this means AI assistants can now reliably query genomic databases, retrieve protein sequences, and analyze gene expression data—all through a unified protocol.
Architecture: NCBI Gene MCP Client
My implementation consists of three core components:
1. MCP Server (JSON-RPC Compliant)
The server exposes tools for gene symbol queries, flexible search, and ID-based retrieval. It's fully compliant with NCBI's rate-limiting standards (3 requests/second without API key, 10 with key) to ensure responsible API usage.
# Example MCP tool definition
@mcp_server.tool()
async def search_gene(symbol: str, organism: str = "Homo sapiens"):
"""Search for gene information by symbol."""
results = await entrez_client.search(
db="gene",
term=f"{symbol}[Gene Name] AND {organism}[Organism]"
)
return format_gene_results(results)
2. Dual Interface Design
The system provides both a REST API (FastAPI) for web applications and a CLI for command-line workflows. This flexibility allows researchers to integrate gene data retrieval into Jupyter notebooks, web dashboards, or automated pipelines.
3. Production Deployment
The entire system is deployed on Vercel with Bootstrap frontend, featuring modular design and automated testing for maintainability. The architecture follows clean separation of concerns:
- Data Layer: Entrez API client with caching and retry logic
- Business Logic: Gene/protein metadata parsing and normalization
- Presentation: REST endpoints and CLI commands
Why This Matters for Bioinformatics Research
Traditional bioinformatics workflows often involve manual data retrieval, format conversion, and integration steps. MCP-enabled tools can automate these processes while maintaining data provenance and reproducibility.
Key benefits include:
- Structured Data Access: Gene metadata returned in consistent JSON format
- AI Integration: LLMs can query genomic databases as part of research workflows
- Rate Limit Compliance: Built-in throttling prevents API abuse
- Extensibility: Easy to add new databases (UniProt, Ensembl, etc.)
Technical Challenges & Solutions
Handling NCBI's XML Responses
NCBI returns data in XML format, which requires careful parsing. I implemented custom parsers that extract relevant fields while handling missing data gracefully:
def parse_gene_record(xml_record):
"""Extract structured data from NCBI gene record."""
return {
"gene_id": safe_extract(xml_record, "Entrezgene_track-info/Gene-track/Gene-track_geneid"),
"symbol": safe_extract(xml_record, "Entrezgene_gene/Gene-ref/Gene-ref_locus"),
"description": safe_extract(xml_record, "Entrezgene_gene/Gene-ref/Gene-ref_desc"),
"organism": safe_extract(xml_record, "Entrezgene_source/BioSource/BioSource_org/Org-ref/Org-ref_taxname"),
"chromosome": safe_extract(xml_record, "Entrezgene_locus/Gene-commentary/Gene-commentary_label"),
}
Async Rate Limiting
To comply with NCBI's usage policies, I implemented an async rate limiter using Python's asyncio:
class RateLimiter:
def __init__(self, requests_per_second: float):
self.min_interval = 1.0 / requests_per_second
self.last_request = 0
self._lock = asyncio.Lock()
async def acquire(self):
async with self._lock:
elapsed = time.time() - self.last_request
if elapsed < self.min_interval:
await asyncio.sleep(self.min_interval - elapsed)
self.last_request = time.time()
Future Directions
This project opens several exciting possibilities:
- Integration with RAG systems for literature-aware gene analysis
- Multi-database queries (combining NCBI, UniProt, and PDB data)
- Automated annotation pipelines for novel sequences
- Natural language interfaces for complex genomic queries
Conclusion
Building MCP clients for bioinformatics APIs represents a significant step toward making genomic data more accessible to AI-powered research tools. By combining structured protocols with domain-specific knowledge, we can create systems that accelerate scientific discovery while maintaining data quality and API compliance.
The code for this project demonstrates how modern software engineering practices—async programming, modular design, automated testing—can be applied to bioinformatics tool development. As AI continues to transform research workflows, such integrations will become increasingly valuable.