Robust Semantic Data Model for Quicker Insights & Behavioural Analytics

Most organizations are built with threat detection and investigation capabilities leveraging a ton of vendor solutions. According to a 2020 Ponemon Institute report, organizations deploy on average more than 45 security solutions and technologies. This approach leads to multiple vendor–specific data silos which often results in storing multiple copies of the same data and no correlation across data silos.

No legacy security vendor is able to cost-effectively store and analyze all the data required to detect threats and facilitate post-hoc investigations and remediation in today’s heterogeneous environments. A modern approach applies the power of second-generation cloud vendors, providing the scalability required, and a robust semantic data model for quicker insights with behavioral analytics.

The Elysium Analytics Open Data Model (ODM) brings together all security-related telemetry (event, user, network, endpoint, cloud, etc.) into a unified taxonomy that can help detect and understand threats more effectively than before. Unified views are used to create analytic models with richer context of user and entity behaviors across a disparate set of data sources. Furthermore, the Elysium Analytics ODM enables the downstream analytics for sharing and reuse of threat detection models, algorithms and analytics.

Essentially the ODM provides a library of data source-mapping and high-level abstractions (e.g., any new VMs/container creation across the enterprise). Also, it describes security telemetry data used to baseline users and entities in the event data. This baseline with risk-based scoring helps identify anomalous behaviors across different levels of abstraction of user and entity. This data engineering is accomplished through schemas, data structures, file formats, and configurations on the underlying Snowflake data platform where the collecting, storing and processing of security telemetry data takes place at scale.

Elysium defines relationships between the various security data types for joining log data with user, network and endpoint data in both relational and graph models.

Key Features of the Elysium Data Model are
- Entity and user relationships
- Knowledge graph of security events
- Pre-built, high-level views of security for threat investigation

Semantic Data Model

To provide a framework for effective cyber threat analytics, it is necessary to collect and analyze both the standard security event logs and alerts as well as the relevant contextual data.

In addition to the most common entities such as network, user and endpoint, we include other data points such as file and certificate.

In the diagram below, the raw event tells us that user “lsmith” successfully logged into a WebServices hosted server from the IP address 10.1.1.15. Based on the raw event only, we don’t know if this event is a possible threat or not. However, after injecting user and endpoint context, the enriched event indicates a potential threat with root access that requires further investigation since this is a new unusual behavior for the user “lsmith.”

security events and semantic data model screenshot — Security events

Extensibility of Data Models

Our ODM can also be extended to accommodate custom attributes by embedding key-value pairs within the log/alert/context entries. We have a separate Enrichment Json for each log record where all the enrichment of logs (e.g., Geo enrichment, Threat-Intel enrichment and Asset Enrichments) are stored. The model itself is extensible to add more enrichments as needed.

Mapping to Third-Party Data Models

Models are extensible to map to third-party data models (e.g., ArcSight CEF or any other vendors schema). Once the mapping layer is implemented, all the third-party vendor data is funneled through the Elysium ODM and applied to the downstream models and analytics without any modifications required.

Model Relationships

The relationships between the data model entities are illustrated below.

Relational:

open data model or semantic data model flowchart — Open Data Model

As you see in the above hierarchical layers, the original source data fidelity is preserved and all the transformations happen during the query time. The model gives full flexibility, extensibility and adaptability for working across all the disparate log sources. This was carefully designed by the engineers who worked with hundreds of security log sources. As we add more sources from the bottom, the downstream analytics won’t need any changes.

The example below shows the enrichment of the security data through the ODM schema using threat intelligence from a shared table on Snowflake.

Contextual Relational Enrichment:

Elysium Knowledge Graphs

Let’s connect how Elysium Knowledge Graphs provide additional insights into user activity. For example, a scenario where an EC2 machine hosted on AWS by user “John” is accessed from a malicious IP. This will be logged as a “finding.” With the Graph view, an analyst can see the complete flow of all entities connected to this user and understand the severity of the “finding” as well as entities impacted through it.

Suppose the same user “John” is also using a service through some role and is connected to this service with a private IP which also generates a finding or an alert.

An analyst can see what service “John” was using, what “role” he was assigned and who the role was assigned by, as well as the service configuration. Furthermore, the analyst can observe all connections to any endpoint accessed by “John” and any subsequent connections from these endpoints to a second layer of entities. As these findings originate from the same user, both findings will be shown together in the below graph, presenting a complete picture to analysts for root cause analysis and resolution in just a few minutes.

data knowledge graph flowchart — Knowledge graph

Conclusion

Elysium ODM enables security analysts to uncover advanced threats and anomalies within enterprise networks. Through semantic data model and data mapping, security analysts can reduce attacker dwell time by discovering and assessing adversarial behavior faster and with fewer resources. Furthermore, the semantic data model fuses security data sources with other contextual data to generate an enterprise behavior graph, a unique visual environment for analyzing advanced adversarial behaviors across petabytes of data. Elysium users will also benefit from our kill chain-focused User and Entity Behavior Analytics (UEBA) which automatically uncover adversarial TTPs. The model enables machine learning-powered analytics to provide greater context and focus to hunts and combines with linked data techniques to present analysts with intuitive summaries and visualizations of threat actor behavior along the cyber kill chain.

The Elysium ODM enables organizations to:

Store one copy of the security telemetry data and apply unlimited analytics

Full text search across the data
Out-of-the-box analytics using the ODM
Custom analytics to your desired specification
Plug-in third-party vendor analytics

Leverage all your security telemetry data to establish the context needed to better detect threats

Behavioral analytics for user, endpoint and network entity data
Enrichment with threat intelligence data
Avoid “lock-in” to a specific technology and gain needed analytic flexibility resulting from an ODM
Security log Semantic Data Lake which enables faster and better contextual analytics to find unknowns