Building Model Context Protocol (MCP) Clients for Bioinformatics APIs

Published on January 20, 2026

The intersection of AI and bioinformatics is creating unprecedented opportunities for data-driven research. In this post, I'll share my experience building an MCP (Model Context Protocol) client that integrates with the NCBI Entrez API to retrieve structured gene and protein metadata—a project that bridges modern AI tooling with life sciences data.

What is Model Context Protocol (MCP)?

MCP is an emerging standard for enabling AI models to interact with external tools and data sources in a structured, reliable way. Unlike ad-hoc API integrations, MCP provides a standardized JSON-RPC interface that allows language models to query databases, retrieve files, and execute functions with proper error handling and rate limiting.

For bioinformatics applications, this means AI assistants can now reliably query genomic databases, retrieve protein sequences, and analyze gene expression data—all through a unified protocol.

Architecture: NCBI Gene MCP Client

My implementation consists of three core components:

1. MCP Server (JSON-RPC Compliant)

The server exposes tools for gene symbol queries, flexible search, and ID-based retrieval. It's fully compliant with NCBI's rate-limiting standards (3 requests/second without API key, 10 with key) to ensure responsible API usage.

# Example MCP tool definition
@mcp_server.tool()
async def search_gene(symbol: str, organism: str = "Homo sapiens"):
    """Search for gene information by symbol."""
    results = await entrez_client.search(
        db="gene",
        term=f"{symbol}[Gene Name] AND {organism}[Organism]"
    )
    return format_gene_results(results)

2. Dual Interface Design

The system provides both a REST API (FastAPI) for web applications and a CLI for command-line workflows. This flexibility allows researchers to integrate gene data retrieval into Jupyter notebooks, web dashboards, or automated pipelines.

3. Production Deployment

The entire system is deployed on Vercel with Bootstrap frontend, featuring modular design and automated testing for maintainability. The architecture follows clean separation of concerns:

Data Layer: Entrez API client with caching and retry logic
Business Logic: Gene/protein metadata parsing and normalization
Presentation: REST endpoints and CLI commands

Why This Matters for Bioinformatics Research

Traditional bioinformatics workflows often involve manual data retrieval, format conversion, and integration steps. MCP-enabled tools can automate these processes while maintaining data provenance and reproducibility.

Key benefits include:

Structured Data Access: Gene metadata returned in consistent JSON format
AI Integration: LLMs can query genomic databases as part of research workflows
Rate Limit Compliance: Built-in throttling prevents API abuse
Extensibility: Easy to add new databases (UniProt, Ensembl, etc.)

Technical Challenges & Solutions

Handling NCBI's XML Responses

NCBI returns data in XML format, which requires careful parsing. I implemented custom parsers that extract relevant fields while handling missing data gracefully:

def parse_gene_record(xml_record):
    """Extract structured data from NCBI gene record."""
    return {
        "gene_id": safe_extract(xml_record, "Entrezgene_track-info/Gene-track/Gene-track_geneid"),
        "symbol": safe_extract(xml_record, "Entrezgene_gene/Gene-ref/Gene-ref_locus"),
        "description": safe_extract(xml_record, "Entrezgene_gene/Gene-ref/Gene-ref_desc"),
        "organism": safe_extract(xml_record, "Entrezgene_source/BioSource/BioSource_org/Org-ref/Org-ref_taxname"),
        "chromosome": safe_extract(xml_record, "Entrezgene_locus/Gene-commentary/Gene-commentary_label"),
    }

Async Rate Limiting

To comply with NCBI's usage policies, I implemented an async rate limiter using Python's asyncio:

class RateLimiter:
    def __init__(self, requests_per_second: float):
        self.min_interval = 1.0 / requests_per_second
        self.last_request = 0
        self._lock = asyncio.Lock()
    
    async def acquire(self):
        async with self._lock:
            elapsed = time.time() - self.last_request
            if elapsed < self.min_interval:
                await asyncio.sleep(self.min_interval - elapsed)
            self.last_request = time.time()

Future Directions

This project opens several exciting possibilities:

Integration with RAG systems for literature-aware gene analysis
Multi-database queries (combining NCBI, UniProt, and PDB data)
Automated annotation pipelines for novel sequences
Natural language interfaces for complex genomic queries

Conclusion

Building MCP clients for bioinformatics APIs represents a significant step toward making genomic data more accessible to AI-powered research tools. By combining structured protocols with domain-specific knowledge, we can create systems that accelerate scientific discovery while maintaining data quality and API compliance.

The code for this project demonstrates how modern software engineering practices—async programming, modular design, automated testing—can be applied to bioinformatics tool development. As AI continues to transform research workflows, such integrations will become increasingly valuable.

Related Project: NCBI Gene MCP Client - View the full project details and implementation.