FIG. 00 — SYSTEM OVERVIEW

William
Ronchetti

Senior backend & infrastructure engineer

I build secure, scalable AWS data platforms in Python — the plumbing that moves petabytes of research data from raw instrument output to something scientists can query, and increasingly the LLM/RAG systems layered on top of it.

Base
Alexandria, VA
Focus
IaC · data pipelines · LLM/RAG
Domain
Genomics & omics platforms
Compliance
FISMA-Moderate · HIPAA · GDPR
RAW SOURCES vendors · CROs · PDFs INGEST Lambda · Glue · Batch VALIDATE + MODEL Parquet · Athena STORE + INDEX Postgres · Elastic · vectors SERVE ECS Fargate · Nginx · API RAG / LLM Bedrock · vector + text-to-SQL observability · Splunk HEC
The shape of most of my work: ingest → validate → store → serve, now with an LLM/RAG layer on top — instrumented end to end.
01

About

I'm a back-end and infrastructure engineer with 7+ years living where research data meets production infrastructure. The recurring theme of my career: taking systems that grew organically — the ones held together by tribal knowledge and a few brave shell scripts — and turning them into reproducible, audited, compliant platforms.

My strengths are infrastructure-as-code, API and database performance optimization, container-based deployments, and observability. I've delivered FISMA-Moderate, HIPAA, and GDPR-compliant systems that have passed multiple independent security audits, and I care as much about the engineers using the platform as the platform itself.

Lately that work has grown upward into LLM/RAG systems — a document-intelligence pipeline that reads thousands of scientific reports and answers questions over them with hybrid retrieval on AWS Bedrock, backed by a PHI de-identification layer so it can touch sensitive data safely.

  • 7+ years building
    data platforms
  • PB scale of data
    under management
  • 5 peer-reviewed
    publications
  • 267 citations across
    that work
02

Experience

  1. Data Product Engineer

    Novo Nordisk · contract · Remote

    Sept 2024 — Present
    • Lead backend engineer for the Omics Platform, an internal AWS system managing petabytes of diverse omics data for dozens of research teams — cataloguing and analysis alike.
    • Built an LLM-based document intelligence system that ingests thousands of scientific PDF reports (born-digital and scanned), extracts structured data, and serves a chat agent over a hybrid retrieval architecture combining vector search and text-to-SQL on AWS Bedrock.
    • Designed a two-zone PHI de-identification pipeline to meet HIPAA requirements — enabling LLM processing of sensitive documents with small-cell suppression and controlled re-identification.
    • Built a highly scalable ingestion pipeline on Lambda, Glue, and Batch for validation and transfer of raw data and discovery metadata across several dozen vendors/CROs.
    • Designed a Parquet/Athena metadata model for tracking diverse omics datasets, cutting dataset onboarding and discovery time for research teams.
    • Review code and mentor a small team on software, cloud-engineering, and security best practices to ensure GDPR compliance.
    • AWS Bedrock
    • RAG
    • text-to-SQL
    • Lambda
    • Glue
    • Batch
    • Athena
    • HIPAA
  2. Senior Software Engineer

    Harvard Medical School · Dept. of Biomedical Informatics · Remote / Boston, MA

    Aug 2019 — Present

    Senior since 2022 · half-time since Jan 2025 · previously Software Engineer

    • Lead back-end development in Python for the CGAP, 4DN, and SMaHT data platforms, supporting petabytes of raw data and associated metadata.
    • Refactored complex, organically grown AWS infrastructure into reproducible infrastructure-as-code (CloudFormation), achieving FISMA-Moderate compliance and passing multiple independent security audits — with workflows for promptly resolving vulnerabilities.
    • Migrated the web platform from Apache on Elastic Beanstalk to Nginx on ECS Fargate, cutting compute costs by 50% with configurable autoscaling in response to traffic.
    • Implemented observability with structured event logging via a Splunk HEC pipeline and container log shipping through FireLens / Fluent Bit.
    • Optimized Postgres and Elasticsearch query performance by resolving aggregation timeouts from large terms filters — 50% faster responses.
    • Devised and implemented advanced security protocols — encryption, secrets management, access controls — and mentored junior engineers and domain experts on feature development and code review.
    • CloudFormation
    • ECS Fargate
    • Nginx
    • PostgreSQL
    • Elasticsearch
    • Splunk
    • FISMA
  3. Associate in Research (part-time)

    Duke University · Dept. of Electrical & Computer Engineering · Durham, NC

    Feb 2019 — Present
    • Develop assignments for an advanced C++ course and improve introductory C programming assignments on Coursera; support students with C and Python coursework.
  4. Teaching Assistant (part-time)

    Cornell University · Ithaca, NY

    Aug 2016 — Dec 2018
    • Taught Operating Systems (CS 4410/4411) and System Security (CS 5430/5431) with practicum sections; built and improved C autograders and graded exams.
03

Selected Work

Co-author on 5 peer-reviewed publications for data-platform and pipeline contributions, with 267 citations. Selected highlights:

Stack

Languages

Python · SQL · C/C++

Backend & Data

PostgreSQL (RDS) · Elasticsearch / Lucene · Redis · DynamoDB · Amazon Athena · S3 · Parquet · Kafka · SQS · Pyramid · Flask · boto3 · SQLAlchemy · pytest

AI / LLM

AWS Bedrock · retrieval-augmented generation (RAG) · vector search · text-to-SQL · embeddings · PHI de-identification

Infrastructure & IaC

CloudFormation (Troposphere) · Terraform · VPC · EC2 · ECS (Fargate) · ECR · Lambda · Batch · Step Functions · API Gateway · Route 53 · Elastic Beanstalk · Docker · Nginx · Apache

Observability & CI/CD

CloudWatch · Splunk (HEC) · Fluent Bit / FireLens · Sentry · GitHub Actions · Travis CI

Security & Compliance

AWS KMS · Secrets Manager · Security Hub · ACM · FISMA-Moderate · HIPAA · GDPR

Scientific Pipelines

Nextflow · Seqera

04

Education

Cornell University · Ithaca, NY

  • M.Eng., Computer ScienceGPA 3.69 · Dec 2018
  • B.A., Computer ScienceGPA 3.43 · May 2018

Varsity Cross Country, Indoor & Outdoor Track · Association of Computer Science Undergraduates.

05

Off-Hours

Coaching

Assistant Cross-Country and Track & Field coach at Arlington Public Schools (Yorktown) — helping the next generation of distance runners train, race, and stick with the sport.

Volunteering

Volunteer with Virginia German Shepherd Rescue and the Lost Dog & Cat Rescue Foundation, supporting fostering and adoption.

Running

Varsity Cross Country and Indoor & Outdoor Track at Cornell. A lifelong endurance athlete — the same patience and consistency I bring to long-running systems work.