Skip to content

Improve generic observable type detection in InQuest analyzer #3280

@lakshita10341

Description

@lakshita10341

Description

The type_of_generic() method in the InQuest analyzer has a TODO comment at line 49 indicating that the validation should be more thorough:

https://github.com/intelowlproject/IntelOwl/blob/master/api_app/analyzers_manager/observable_analyzers/inquest.py#L45-L51

def type_of_generic(self):
    if re.match(r"^[\w\.\+\-]+\@[\w]+\.[a-z]{2,3}$", self.observable_name):
        type_ = "email"
    else:
        # TODO: This should be validated more thoroughly
        type_ = "filename"
    return type_

Current Issues

  1. Weak email regex: The pattern doesn't handle:

    • TLDs longer than 3 characters (.info, .museum, .technology)
    • Subdomains (user@sub.domain.com)
  2. No detection for other supported types: The InQuest API supports registry and xmpid types (as seen in line 83), but these are never detected automatically

  3. Everything defaults to filename: Any non-email observable is assumed to be a filename without validation

Proposed Solution

Improve the type_of_generic() method to:

  1. Use a more comprehensive email regex pattern
  2. Add detection for Windows registry keys (e.g., HKEY_LOCAL_MACHINE\...)
  3. Add detection for XMP IDs (UUID-like patterns)
  4. Add basic filename validation
  5. Log a warning for unrecognized patterns

Example Implementation

def type_of_generic(self):
    """Determine the type of a generic observable."""
    # Email pattern - more comprehensive
    email_pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    if re.match(email_pattern, self.observable_name):
        return "email"
    
    # Windows Registry key pattern
    registry_pattern = r"^(HKEY_|HK[A-Z]{2,})"
    if re.match(registry_pattern, self.observable_name, re.IGNORECASE):
        return "registry"
    
    # XMP ID pattern (UUID-like)
    xmpid_pattern = r"^[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}$"
    if re.match(xmpid_pattern, self.observable_name):
        return "xmpid"
    
    # Filename pattern - basic validation
    filename_pattern = r"^[\w\-. ]+\.[a-zA-Z0-9]{1,10}$"
    if re.match(filename_pattern, self.observable_name):
        return "filename"
    
    # Default with warning
    logger.warning(
        f"Could not determine type of generic observable: {self.observable_name}. "
        "Defaulting to 'filename'."
    )
    return "filename"

I am interested in working on this issue if approved. And tell me any required changes if needed.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions