Job Finder — Automated Vacancy Scraper
Fully automated Python tool that daily scrapes vacancies from 6 sources, scores them by 6 components, and exports top-20 to Google Sheets — triggered at 09:00 via GitHub Actions.
Manual job hunting is chaos: checking 5–6 different platforms daily, hundreds of irrelevant listings, manually filtering by stack, type, and level. This tool automates the entire process and delivers the best matches with scores to a Google Sheet once a day.
Highlights
- Collects listings from 6 sources (RemoteOK, WeWorkRemotely, HackerNews, Adzuna, StepStone, XING) in parallel async mode; tech stack extraction from free text via FlashText (200+ terms, O(n)).
- 6-component scoring per vacancy: stack match, TF-IDF profile similarity, remote type, stop words, contract type, entry threshold — SHA256 deduplication with cross-run state tracking.
- Exports top-20 daily to Google Sheets with color coding and per-component score breakdown; auto-triggered at 09:00 via GitHub Actions CI.
- 130 tests, >80% coverage; Python 3.11+, async/await, Pydantic v2, httpx, feedparser, BeautifulSoup4, scikit-learn (TF-IDF), gspread.
Impact
- ↑6 job sources scraped in parallel async mode daily
- ↑130 tests, >80% coverage on scoring and aggregation pipeline
- ↑Top-20 delivered to Google Sheets with color-coded scores
- ↑Zero manual filtering — full pipeline from source to export