Junyu Li

SafeClick – Phishing Detection System

Deployed and live. Try it: https://safeclick.dev/ — serving production traffic with deterministic validation and human-in-the-loop–ready structured reports.

Demo

SafeClick Screenshot

🏗️ System Architecture

graph TD
    A[User Input: URL / Tx Memo] --> B{Entry Layer / Router}
    B -->|Bypass/Cache| C[(Domain Reputation Cache)]
    B -->|New Scan| D[Multi-stage AI Pipeline]
    
    subgraph "AI Agent Pipeline"
    D --> E[Stage 1: Heuristic & LLM URL Analysis]
    E --> F[Stage 2: Sandbox & Evidence Collection]
    F --> G[Stage 3: Final Reasoning & Verdict]
    end
    
    G --> H[Deterministic Logic Layer]
    H --> I[Output: Structured JSON Report]
    C --> I

Problem

Traditional ML or LLM-only approaches fail in phishing detection due to sparse signals, zero-day variants, and unreliable external evidence. Users need a security-focused system that operates under real-world constraints. SafeClick bridges the gap between unreliable AI outputs and mission-critical security by applying a zero-trust stance: LLM outputs are never trusted until they pass explicit validation boundaries and deterministic scoring.

Approach

  • Designed a multi-stage phishing detection pipeline for URLs with constrained LLM reasoning and deterministic risk scoring
  • Implemented deterministic security logic to ensure stable verdict behavior in non-deterministic model environments
  • Treated the LLM as a probabilistic component behind explicit validation boundaries, caching/deduplication, and fallback logic
  • Engineered for reliability under degraded dependencies with clear failure modes and cost control
  • Evolved a hackathon prototype toward a production-ready security service with system boundaries and observability

Web3 & Blockchain Security

The system treats transaction memo phishing as a first-class threat vector. Attackers send micro-transactions (e.g. 0.000001 tokens) with malicious URLs embedded in the memo field, targeting wallet users who click without verifying. SafeClick’s pipeline treats the memo as an input vector: extracted URLs are fed into the same AI-driven URL scanner and deterministic logic layer, so Web3 users get the same structured risk report whether they paste a URL or submit a transaction memo. This bridges traditional web phishing and on-chain social engineering in one pipeline.

Engineering Highlights

FeatureImplementationBenefit
DeduplicationFirestore Transaction-based ScanJobsPrevented 40% redundant LLM calls
CachingTwo-tier: Bloom Filter + TTL FirestoreReduced latency by ~200ms for known domains
ReliabilityPydantic Schema Validation0% output format error rate
Testing38+ Unit & Integration TestsEnsured safe fail-over for insufficient evidence

Results

  • Deployed at safeclick.dev — serving production traffic with structured risk reports for URLs and transaction memos
  • Won 1st Place and Most Secure Project Award (Hofstra-Pensar Hackathon)
  • Reduced nondeterminism in decision outputs through deterministic scoring and validation boundaries; 0% output-format error rate in production
  • Improved resilience to partial failures via explicit fallbacks and pipeline-stage fault handling

Tech Stack

Python, Pydantic, LLM orchestration, multi-stage URL analysis, caching, deduplication