refactor(phishunt): use single open feed endpoint and fix related bugs (#6198)#6774
refactor(phishunt): use single open feed endpoint and fix related bugs (#6198)#6774maximerafaillac wants to merge 11 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6774 +/- ##
===========================================
- Coverage 33.38% 0.32% -33.07%
===========================================
Files 1994 1902 -92
Lines 122844 120238 -2606
===========================================
- Hits 41013 390 -40623
- Misses 81831 119848 +38017
📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
|
Thank you for your contribution, but we need you to sign your commits. Please see https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits |
150948e to
ffd8ecb
Compare
stix2.TLP_WHITE references the deprecated TLP 1.0 marking definition ID
(marking-definition--613f2e26-...) which does not exist in current
OpenCTI instances, causing markings to silently fail on ingestion.
- Import MarkingDefinition from pycti
- Define module-level TLP_CLEAR constant using the deterministic
pycti.MarkingDefinition.generate_id('TLP', 'TLP:CLEAR') ID
- Replace all stix2.TLP_WHITE references with TLP_CLEAR
Closes #6593
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…json phishunt.io now provides a free, open JSON feed requiring no authentication. The previous dual-path logic (public TXT feed via urllib / private JSON feed via requests + API key) is replaced by a single method. - Remove urllib, ssl, re imports (no longer needed) - Remove self.phishunt_api_key from __init__ - Replace _process_public_feed + _process_private_feed with _process_feed consuming https://phishunt.io/feed.json - All enrichment (domain, IP, country, organization) now applied uniformly to every entry, previously only available via the private feed path - Remove API key branching in process_message Closes #6198 Closes #6177 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ship stix_relationship_organization_url was calling generate_id with stix_domain.id as the target instead of stix_organization.id. This caused it to generate the same ID as stix_relationship_url_domain, making the url→organization relationship silently overwrite the url→domain relationship in the STIX bundle. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
URLs containing backslashes or single quotes produced invalid STIX patterns, causing Indicator creation to fail or generate malformed pattern strings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Phishunt API occasionally returns empty or invalid values in the ip field (e.g. empty string, '-'). Passing these directly to stix2.IPv4Address raised an exception and aborted processing of the entire bundle. - Validate ip value with ipaddress.IPv4Address before object creation - Skip IP, domain→ip and ip→location objects with a warning log on failure - URL and domain observables are still ingested even when the IP is invalid Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The phishunt.io feed is now fully public and authentication is no longer required. PHISHUNT_API_KEY has no effect but is silently accepted to avoid breaking existing deployments. - Remove api_key field from PhishuntConfig - Remove SecretStr import - Add deprecation warning in migrate_deprecated_interval validator when api_key is present (same pattern as PHISHUNT_INTERVAL deprecation) - Update tests: remove api_key from valid/invalid fixtures, add a dedicated test case verifying the deprecated key is accepted without error Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8481834 to
89d190b
Compare
…d/_process_private_feed These methods were replaced by a single _process_feed in the refactoring. The equivalent coverage is provided by TestProcessFeed in test_main.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- VC312: add cleanup_inconsistent_bundle=True to send_stix2_bundle call - VC402: replace entrypoint.sh with direct ENTRYPOINT in Dockerfile - VC101: normalize OPENCTI_TOKEN value to ChangeMe in docker-compose.yml Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors the Phishunt external-import connector to consume the now-public https://phishunt.io/feed.json via requests, removes the legacy public/private feed split, and addresses several ingestion correctness issues (TLP marking, deterministic relationship IDs, STIX pattern escaping, and IPv4 validation). It also updates tests and container packaging accordingly.
Changes:
- Consolidate feed handling into a single
_process_feed()usingrequests.get(feed.json)and remove API-key branching. - Update ingested STIX objects/relationships: TLP:WHITE → TLP:CLEAR, fix relationship ID generation, escape URL values in STIX patterns, and skip invalid IPv4s.
- Update config/settings migration and tests for deprecated
PHISHUNT_API_KEY, and adjust Dockerfile entrypoint.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| external-import/phishunt/src/connector/connector.py | Implements unified feed ingestion, new TLP:CLEAR marking, relationship fixes, pattern escaping, IPv4 validation, and bundle sending. |
| external-import/phishunt/src/connector/settings.py | Removes api_key from config model and warns when legacy PHISHUNT_API_KEY is provided. |
| external-import/phishunt/tests/test_main.py | Adds _process_feed unit tests (including HTTP error handling, invalid IP behavior, and indicator toggling). |
| external-import/phishunt/tests/tests_connector/test_settings.py | Updates settings validation cases and accepts legacy api_key input without failing validation. |
| external-import/phishunt/Dockerfile | Changes runtime entrypoint to run python3 main.py from the connector workdir. |
| external-import/phishunt/docker-compose.yml | Simplifies the sample OPENCTI_TOKEN placeholder. |
| bundle = self.helper.stix2_create_bundle( | ||
| [self.stix_created_by] + bundle_objects | ||
| ) |
| self.helper.connector_logger.warning( | ||
| "[Phishunt] Skipping invalid IP value.", | ||
| {"ip": ip_value, "url": url_value}, | ||
| ) |
| self.helper.connector_logger.error( | ||
| "[Phishunt] Http error during private feed process.", {"error": err} | ||
| "[Phishunt] HTTP error while fetching feed.", | ||
| {"url": feed_url, "error": str(err)}, | ||
| ) |
| - CONNECTOR_DURATION_PERIOD=P3D # ISO8601 format in String, start with 'P...' for Period | ||
|
|
||
| # Connector's custom execution parameters | ||
| - PHISHUNT_API_KEY= # Optional, if not provided, consume only https://phishunt.io/feed.txt |
There was a problem hiding this comment.
You can remove this line as it is not used anymore
| # Expose and entrypoint | ||
| COPY entrypoint.sh / | ||
| RUN chmod +x /entrypoint.sh | ||
| ENTRYPOINT ["/entrypoint.sh"] |
There was a problem hiding this comment.
you can remove the file entrypoint.sh
Summary
Resolves #6198 — merge public/private feed logic into a single
_process_feedmethod using the now-freefeed.jsonendpoint.Also resolves #6177 — the HTTP 403 caused by
urllibmissing a User-Agent is fixed by switching entirely torequests.Also resolves #6593 — TLP:WHITE replaced with TLP:CLEAR.
Changes
Refactor
_process_public_feed/_process_private_feedwith a single_process_feedconsuminghttps://phishunt.io/feed.jsonurllib,ssl,reimports —requestsonlyprocess_messagePHISHUNT_API_KEYenv var with a warning (same pattern asPHISHUNT_INTERVAL)Bug fixes
pycti.MarkingDefinition.generate_id('TLP', 'TLP:CLEAR')for a valid OpenCTI marking IDurl→organizationwas usingstix_domain.idingenerate_idinstead ofstix_organization.id, causing the relationship to silently overwriteurl→domain\\and'in URL values before embedding in STIX patternsipaddress.IPv4Addressbefore creating the observable; skip IP/domain→ip/ip→location objects with a warning log when invalid, URL and domain are still ingestedCommits
686f1fa26fbc36966e268ea8dabf31a9565cd5fc65Testing
api_keyreferencesapi_keyis accepted without error