Work

Python-Scraping-Pipeline und Browser-Extension für intelligente Job-Suche

Job Finder — Automated Vacancy Scraper

Fully automated Python tool that daily scrapes vacancies from 6 sources, scores them by 6 components, and exports top-20 to Google Sheets — triggered at 09:00 via GitHub Actions.

Manual job hunting is chaos: checking 5–6 different platforms daily, hundreds of irrelevant listings, manually filtering by stack, type, and level. This tool automates the entire process and delivers the best matches with scores to a Google Sheet once a day.

Highlights

  • Collects listings from 6 sources (RemoteOK, WeWorkRemotely, HackerNews, Adzuna, StepStone, XING) in parallel async mode; tech stack extraction from free text via FlashText (200+ terms, O(n)).
  • 6-component scoring per vacancy: stack match, TF-IDF profile similarity, remote type, stop words, contract type, entry threshold — SHA256 deduplication with cross-run state tracking.
  • Exports top-20 daily to Google Sheets with color coding and per-component score breakdown; auto-triggered at 09:00 via GitHub Actions CI.
  • 130 tests, >80% coverage; Python 3.11+, async/await, Pydantic v2, httpx, feedparser, BeautifulSoup4, scikit-learn (TF-IDF), gspread.

Impact

  • 6 job sources scraped in parallel async mode daily
  • 130 tests, >80% coverage on scoring and aggregation pipeline
  • Top-20 delivered to Google Sheets with color-coded scores
  • Zero manual filtering — full pipeline from source to export

Tech stack

Python 3.11asyncioPydantic v2httpxBeautifulSoup4feedparserscikit-learnFlashTextgspreadGitHub Actions

LinkedIn Job Assistant — Browser Extension

Chromium extension (Manifest V3) that annotates LinkedIn job cards directly in the browser with color codes and scores — no scraping, no requests to LinkedIn servers.

LinkedIn Jobs is the largest platform, but has no smart filtering for your specific stack. Opening every card manually, reading the description, deciding — the extension does this directly in the LinkedIn interface, without touching their servers.

Highlights

  • Reads the DOM of the already-loaded LinkedIn page (no scraping, no clicks) — LinkedIn physically cannot detect this activity; Chromium/Edge, Manifest V3, MutationObserver.
  • Adds colored badges directly on job cards: 🟢 87 | C#, .NET — score visible without opening the card; full score breakdown in opened vacancy with matched stack and red flags.
  • Two scoring modes: lightweight batch scoring for card listing (4 components), full scoring for opened vacancy (all 6 components via ScoreAggregator).
  • Local FastAPI server on localhost:8765 — the only network request goes to your own machine; fully reuses the scoring engine from Job Finder (same scorers, extractors, profile.yaml). Jest tests for JS, pytest for the server.

Impact

  • Zero LinkedIn server requests — DOM-only approach undetectable
  • Real-time badge scoring on all visible job cards in listing
  • Shared scoring engine with Job Finder — single profile.yaml, zero duplication
  • Full scoring breakdown per vacancy: 6 components, matched stack, red flags

Tech stack

JavaScriptManifest V3MutationObserverFastAPIuvicornPythonJestpytest