Skip to content

refactor(phishunt): use single open feed endpoint and fix related bugs (#6198)#6774

Open
maximerafaillac wants to merge 11 commits into
masterfrom
feat/6198-phishunt-refactor-open-feed
Open

refactor(phishunt): use single open feed endpoint and fix related bugs (#6198)#6774
maximerafaillac wants to merge 11 commits into
masterfrom
feat/6198-phishunt-refactor-open-feed

Conversation

@maximerafaillac

Copy link
Copy Markdown
Member

Summary

Resolves #6198 — merge public/private feed logic into a single _process_feed method using the now-free feed.json endpoint.
Also resolves #6177 — the HTTP 403 caused by urllib missing a User-Agent is fixed by switching entirely to requests.
Also resolves #6593 — TLP:WHITE replaced with TLP:CLEAR.

Changes

Refactor

  • Replace dual-path _process_public_feed / _process_private_feed with a single _process_feed consuming https://phishunt.io/feed.json
  • Remove urllib, ssl, re imports — requests only
  • Remove API key branching in process_message
  • Deprecate PHISHUNT_API_KEY env var with a warning (same pattern as PHISHUNT_INTERVAL)

Bug fixes

  • TLP:WHITE → TLP:CLEAR: use pycti.MarkingDefinition.generate_id('TLP', 'TLP:CLEAR') for a valid OpenCTI marking ID
  • Duplicate STIX relationship ID: url→organization was using stix_domain.id in generate_id instead of stix_organization.id, causing the relationship to silently overwrite url→domain
  • STIX pattern injection: escape \\ and ' in URL values before embedding in STIX patterns
  • Invalid IPv4: validate IP with ipaddress.IPv4Address before creating the observable; skip IP/domain→ip/ip→location objects with a warning log when invalid, URL and domain are still ingested

Commits

Commit Description
686f1fa fix(phishunt): replace deprecated TLP:WHITE with TLP:CLEAR
26fbc36 refactor(phishunt): merge feeds into single _process_feed using feed.json
966e268 fix(phishunt): correct duplicate STIX ID on url→organization relationship
ea8dabf fix(phishunt): escape backslashes and single quotes in STIX URL patterns
31a9565 fix(phishunt): validate IPv4 address before creating observable
cd5fc65 refactor(phishunt): deprecate PHISHUNT_API_KEY setting

Testing

  • All pre-commit hooks pass (black, flake8, isort)
  • Existing settings tests updated to remove api_key references
  • New test case added: deprecated api_key is accepted without error

@Filigran-Automation Filigran-Automation added the filigran team Item from the Filigran team. label Jun 17, 2026
@Filigran-Automation Filigran-Automation changed the title refactor(phishunt): use single open feed endpoint and fix related bugs refactor(phishunt): use single open feed endpoint and fix related bugs (#6198) Jun 17, 2026
@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.96970% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...xternal-import/phishunt/src/connector/connector.py 96.66% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (1b9681c) and HEAD (b103d1f). Click for more details.

HEAD has 119 uploads less than BASE
Flag BASE (1b9681c) HEAD (b103d1f)
connectors 123 4
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #6774       +/-   ##
===========================================
- Coverage   33.38%    0.32%   -33.07%     
===========================================
  Files        1994     1902       -92     
  Lines      122844   120238     -2606     
===========================================
- Hits        41013      390    -40623     
- Misses      81831   119848    +38017     
Files with missing lines Coverage Δ
external-import/phishunt/src/connector/settings.py 87.09% <100.00%> (+0.43%) ⬆️
...xternal-import/phishunt/src/connector/connector.py 73.07% <96.66%> (+22.29%) ⬆️

... and 1140 files with indirect coverage changes

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

Copy link
Copy Markdown

Thank you for your contribution, but we need you to sign your commits. Please see https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits

@maximerafaillac maximerafaillac force-pushed the feat/6198-phishunt-refactor-open-feed branch from 150948e to ffd8ecb Compare June 19, 2026 22:07
maximerafaillac and others added 9 commits June 22, 2026 10:42
stix2.TLP_WHITE references the deprecated TLP 1.0 marking definition ID
(marking-definition--613f2e26-...) which does not exist in current
OpenCTI instances, causing markings to silently fail on ingestion.

- Import MarkingDefinition from pycti
- Define module-level TLP_CLEAR constant using the deterministic
  pycti.MarkingDefinition.generate_id('TLP', 'TLP:CLEAR') ID
- Replace all stix2.TLP_WHITE references with TLP_CLEAR

Closes #6593

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…json

phishunt.io now provides a free, open JSON feed requiring no authentication.
The previous dual-path logic (public TXT feed via urllib / private JSON feed
via requests + API key) is replaced by a single method.

- Remove urllib, ssl, re imports (no longer needed)
- Remove self.phishunt_api_key from __init__
- Replace _process_public_feed + _process_private_feed with _process_feed
  consuming https://phishunt.io/feed.json
- All enrichment (domain, IP, country, organization) now applied uniformly
  to every entry, previously only available via the private feed path
- Remove API key branching in process_message

Closes #6198
Closes #6177

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ship

stix_relationship_organization_url was calling generate_id with
stix_domain.id as the target instead of stix_organization.id.
This caused it to generate the same ID as stix_relationship_url_domain,
making the url→organization relationship silently overwrite the
url→domain relationship in the STIX bundle.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
URLs containing backslashes or single quotes produced invalid STIX
patterns, causing Indicator creation to fail or generate malformed
pattern strings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Phishunt API occasionally returns empty or invalid values in the ip
field (e.g. empty string, '-'). Passing these directly to stix2.IPv4Address
raised an exception and aborted processing of the entire bundle.

- Validate ip value with ipaddress.IPv4Address before object creation
- Skip IP, domain→ip and ip→location objects with a warning log on failure
- URL and domain observables are still ingested even when the IP is invalid

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The phishunt.io feed is now fully public and authentication is no longer
required. PHISHUNT_API_KEY has no effect but is silently accepted to
avoid breaking existing deployments.

- Remove api_key field from PhishuntConfig
- Remove SecretStr import
- Add deprecation warning in migrate_deprecated_interval validator when
  api_key is present (same pattern as PHISHUNT_INTERVAL deprecation)
- Update tests: remove api_key from valid/invalid fixtures, add a
  dedicated test case verifying the deprecated key is accepted without error

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@maximerafaillac maximerafaillac force-pushed the feat/6198-phishunt-refactor-open-feed branch from 8481834 to 89d190b Compare June 22, 2026 08:42
maximerafaillac and others added 2 commits June 22, 2026 10:46
…d/_process_private_feed

These methods were replaced by a single _process_feed in the refactoring.
The equivalent coverage is provided by TestProcessFeed in test_main.py.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- VC312: add cleanup_inconsistent_bundle=True to send_stix2_bundle call
- VC402: replace entrypoint.sh with direct ENTRYPOINT in Dockerfile
- VC101: normalize OPENCTI_TOKEN value to ChangeMe in docker-compose.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@maximerafaillac maximerafaillac marked this pull request as ready for review June 22, 2026 09:07
Copilot AI review requested due to automatic review settings June 22, 2026 09:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Phishunt external-import connector to consume the now-public https://phishunt.io/feed.json via requests, removes the legacy public/private feed split, and addresses several ingestion correctness issues (TLP marking, deterministic relationship IDs, STIX pattern escaping, and IPv4 validation). It also updates tests and container packaging accordingly.

Changes:

  • Consolidate feed handling into a single _process_feed() using requests.get(feed.json) and remove API-key branching.
  • Update ingested STIX objects/relationships: TLP:WHITE → TLP:CLEAR, fix relationship ID generation, escape URL values in STIX patterns, and skip invalid IPv4s.
  • Update config/settings migration and tests for deprecated PHISHUNT_API_KEY, and adjust Dockerfile entrypoint.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
external-import/phishunt/src/connector/connector.py Implements unified feed ingestion, new TLP:CLEAR marking, relationship fixes, pattern escaping, IPv4 validation, and bundle sending.
external-import/phishunt/src/connector/settings.py Removes api_key from config model and warns when legacy PHISHUNT_API_KEY is provided.
external-import/phishunt/tests/test_main.py Adds _process_feed unit tests (including HTTP error handling, invalid IP behavior, and indicator toggling).
external-import/phishunt/tests/tests_connector/test_settings.py Updates settings validation cases and accepts legacy api_key input without failing validation.
external-import/phishunt/Dockerfile Changes runtime entrypoint to run python3 main.py from the connector workdir.
external-import/phishunt/docker-compose.yml Simplifies the sample OPENCTI_TOKEN placeholder.

Comment on lines 195 to 197
bundle = self.helper.stix2_create_bundle(
[self.stix_created_by] + bundle_objects
)
Comment on lines +143 to 146
self.helper.connector_logger.warning(
"[Phishunt] Skipping invalid IP value.",
{"ip": ip_value, "url": url_value},
)
Comment on lines 205 to 208
self.helper.connector_logger.error(
"[Phishunt] Http error during private feed process.", {"error": err}
"[Phishunt] HTTP error while fetching feed.",
{"url": feed_url, "error": str(err)},
)
- CONNECTOR_DURATION_PERIOD=P3D # ISO8601 format in String, start with 'P...' for Period

# Connector's custom execution parameters
- PHISHUNT_API_KEY= # Optional, if not provided, consume only https://phishunt.io/feed.txt

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this line as it is not used anymore

# Expose and entrypoint
COPY entrypoint.sh /
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the file entrypoint.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

filigran team Item from the Filigran team.

Projects

None yet

5 participants