Case Study

Building a Hybrid AI Job Matcher with Local LLMs and Cloud-Based Task Processing

Built a hybrid AI-powered CV matcher that combined privacy-focused local LLM execution with cloud-based task processing. What began as a personal productivity tool evolved into designing reliable asynchronous workflows using AWS queues and distributed workers.

Build ProductImprove ReliabilityDesign SystemsAdd IntelligenceTake OwnershipSimplify ComplexityDeliver IncrementallyThink in SystemsContinuous Improvement

Story Flow

Context

I wanted a practical way to tailor my CV and applications to different job postings without repeatedly rewriting the same information. Since privacy was important and I already had capable local hardware, I initially built the system around LLMs running directly on my own machine.

Problem

Running local LLMs worked well when my workstation was available, but the approach quickly exposed operational limitations:

Long-running AI tasks blocked the application flow.
Heavy local inference wasn't always practical while actively using my machine.
The system became unavailable whenever my computer was offline.
Failures were difficult to monitor and retry consistently.
Some workflows required asynchronous processing rather than immediate responses.

What started as a personal productivity tool gradually evolved into a distributed task-processing problem.

Constraints

Personal project with limited infrastructure budget. Strong preference for keeping sensitive profile data under my control. Ability to switch between local and remote execution. Need for reliable task execution and recovery. Minimal operational overhead as a solo developer.

What I Did

I designed the platform using a hybrid execution model.

Local-first AI execution

Local LLMs handled most day-to-day CV analysis and job matching workflows.
Personal data remained on my own hardware whenever possible.
The architecture avoided unnecessary dependence on third-party AI APIs.
Queue-based task orchestration

As workloads became longer and more expensive to execute synchronously, I introduced asynchronous processing:

AI requests were converted into jobs.
Jobs were pushed into Amazon SQS queues.
Priority-based execution separated urgent and background workloads.
Dead-letter queues captured failed tasks for investigation.
Workers consumed queued tasks independently from the user-facing application.
Task states were tracked throughout their lifecycle. Cloud deployment

To make the system available beyond my local environment:

Containerized services were deployed to AWS.
Amazon ECS handled worker execution.
Amazon ECR managed container images.
Amazon SQS coordinated distributed processing.
PostgreSQL stored application and task metadata.
Infrastructure was provisioned using Terraform.

Trade-offs

The hybrid approach introduced additional architectural complexity compared to a simple request-response model.

However, it provided several advantages:

Privacy-sensitive workflows could remain local.
Resource-intensive jobs could be processed remotely.
Failures became observable and recoverable.
The application no longer depended on a single machine being online.
Future scaling could be achieved by increasing worker capacity rather than redesigning the system.

Outcome

The project evolved from a personal CV assistant into a resilient AI processing platform capable of balancing privacy, cost, and reliability.

More importantly, it changed how I think about AI systems: LLM integration is rarely just a prompt engineering problem. In practice, it becomes an exercise in task orchestration, operational resilience, and designing around the realities of expensive and unpredictable workloads.

What I Learned

Building AI features taught me that integrating an LLM is often the easiest part of the problem. The real challenge begins once those features interact with actual usage patterns.

I learned that synchronous request-response flows break down quickly when workloads become expensive, unpredictable, or long-running. Treating AI operations as tasks rather than immediate responses led to a more resilient architecture with clearer ownership, retry strategies, and failure handling.

I also gained a deeper appreciation for balancing competing priorities. Keeping sensitive data local improved privacy and reduced external dependencies, while cloud-based workers provided availability and operational flexibility when local execution wasn't practical.

Most importantly, I learned that good system design is rarely about choosing a single "best" technology. It is about understanding constraints, acknowledging trade-offs, and building solutions that fit the realities of how people actually use the product.