Principal Data Scientist

Sebastián
García Vilches

I build ML systems that move financial metrics. Currently leading data science at MACH, Chile's largest digital bank — from credit scoring to LLM-powered products. Bridging deep technical expertise with business strategy through an MBA at UC.

|Santiago, Chile
Sebastián García Vilches

About

Building ML systems that matter

I'm a data scientist with 7+ years of experience building and deploying machine learning systems in fintech and banking. I specialize in the full ML lifecycle — from problem framing and feature engineering to production deployment and monitoring.

At MACH, Chile's largest digital bank, I lead technical direction for the data science team, translating business strategy into ML systems that reach millions of users.

I studied Civil Industrial Engineering at Pontificia Universidad Católica de Chile (top 10% of class), and I'm currently pursuing an MBA at UC to complement my technical depth with strategic leadership skills.

7+
Years in Fintech
50+
Models in Production
4
Companies

Experience

Where I've worked

7+ years building data products at the intersection of machine learning and financial services.

Technical lead for the data science team at Chile's largest digital bank. End-to-end ownership from research to production across recommenders, NLP, computer vision, and credit scoring systems.

  • Set technical direction for the data science org: modeling standards, feature engineering best practices, and production ML guidelines across multiple squads
  • Partnered with product, risk, and engineering leaders to prioritize high-impact ML initiatives, aligning technical roadmaps with business strategy
  • Built a personalization recommendation model increasing user engagement by +130% incremental clicks
  • Designed an LLM-based text classification model for the customer service chatbot, improving satisfaction score from 10% to 60%
  • Developed a Computer Vision identity validation model, replacing a costly external provider and reducing operational expenses by ~10%
  • Led credit scoring system development, enabling access to financial services for over 150,000 customers
  • Led a behavioral model enabling 80% credit exposure expansion for top-tier customers
  • Architected MACH's data lake migration to Apache Iceberg, reducing SQL query costs by 80% with full data versioning
  • Designed and institutionalized an org-wide MLOps monitoring framework for early detection of feature drift and model degradation
PythonAWS SageMakerMLflowHuggingFaceApache IcebergComputer VisionLLMCredit ScoringSparkAirflow

Impact

Impact by the numbers

Key results from production ML systems — measured, deployed, and monitored at scale.

MACH
0
Engagement Uplift

Personalization recommendation model for mobile app home shortcuts.

MACH
0
Chatbot Satisfaction

LLM-based text classification transformed the customer service experience.

MACH
0
SQL Cost Reduction

Data lake migration to Apache Iceberg with full versioning and scalability.

MACH
0
Credit Exposure Expansion

Behavioral model enabling significantly higher credit limits for top-tier clients.

MACH
0
Operational Cost Savings

Computer Vision identity validation replacing a costly third-party provider.

Santander
0
Report Automation

C-level financial report fully automated with near-zero calculation errors.

Skills

Tools of the trade

The technologies I use to build, deploy, and monitor ML systems at scale.

Languages

PythonSQLRGit

ML & AI

Scikit-LearnHuggingFaceMLflowComputer VisionLLM / NLPRecommendation SystemsCredit Risk ModelingA/B Testing

Data Engineering

Apache AirflowApache SparkApache IcebergFeature StoresETL Pipelines

Cloud (AWS)

SageMakerGlueLambdaAthenaS3QuickSight

MLOps

Model MonitoringDrift DetectionModel GovernanceETL TestingProduction ML

Projects

Side projects

Personal R&D — exploring the intersection of audio ML, LLM APIs, and developer tools.

Real-time Audio Transcription Pipeline

Active

A personal productivity tool that captures voice input, filters silence, transcribes speech in real time, and passes the transcript to Claude for analysis — all with a push-to-talk interface.

Pipeline

Microphone
PyAudio capture
VAD
Silero silence filter
Transcribe
Cohere multilingual
Analyze
Claude API
  • Real-time voice activity detection (VAD) using Silero to eliminate silence and reduce transcription costs
  • Spanish-first transcription via Cohere's multilingual API
  • LLM analysis layer (Claude) for summarization, data extraction, and compliance review
  • Push-to-talk GUI designed as a voice-first interface for Claude Code
PythonSilero VADCohereClaude APIPyAudio

Contact

Let's connect

Open to new opportunities, collaborations, or just a good conversation about ML systems.

I typically respond within 24-48 hours. For urgent matters, LinkedIn is the fastest way to reach me.