How to efficiently detect suspicious cyber activities? Encrypted Network Traffic Analysis and Transformation and Normalization techniques

The adoption of network encryption is rapidly growing. Global HTTPS page loads have increased to more than 80%. Even though network encryption is crucial for the protection of users and their privacy, network encryption introduces challenges for systems that perform deep packet inspection and rely heavily on the processing of packet payloads.

Deep packet inspection is the vital operation in intrusion detection and prevention systems, firewalls, l7 filtering and other software packet processing systems. Traditionally, intrusion detection systems inspect packet headers and payloads to report malicious or abnormal traffic behaviour. In encrypted network packets though, only a small portion of the packet is non-encrypted, such as TCP/IP packet headers and TLS handshake. The data transmitted in packet payloads is encrypted. Consequently, even popular intrusion detection systems seem to inadequately inspect encrypted connections, with high false negatives.

To overcome the challenges that network encryption introduces in the domain of network monitoring and inspection, many works employ alternative techniques to identify the nature of the traffic. For instance, there are works that examine the feasibility of traffic characterization in encrypted networks using machine learning. These works focus on different use cases like network analytics, network security and others.

As the results show, despite having encrypted payloads in network packets, we are still able to classify network traffic even in a fine-grained manner. Packet headers contain information like IP addresses, port numbers and packet data sizes. Time-related features, such as flow duration and packet inter-arrival times can be easily computed and are also relevant in encrypted traffic analysis. Thus, when properly combined, all these packet metadata can offer valuable traffic insights.

In the context of CyberSANE, we propose an approach to generate intrusion detection signatures that can detect security incidents even in encrypted network traffic. The signatures match sequences of packet metadata. More specifically, after a thorough examination of the literature and during our analysis, we observe that specific sequences of packet payload sizes in time can reveal discrete events that signify intrusion attempts in a system.

Frequent pattern mining techniques can be used to automate the signature generation procedure. The signatures that are generated are then integrated into FORTH’s pattern matching engine that can report the security incidents that are expressed by the signatures.

Moreover, cyber threats are constantly increasing their level of sophistication. At the same time, cyber-attackers count on an endless amount of information and more resources than some years ago. This leads to consider how important is gathering and sharing information of security incidents: companies need structured data to be shared quickly to remediate and mitigate the effects of the incidents.

In this scenario, transformation and normalization techniques gain special importance, since they allow for an easy sharing of data as well as its standardization, structuration, and formatting so it is easier to recognize and extract information. These techniques allow to answer security incidents quicker and more efficiently.

Common Event Format (CEF)

ArcSight has developed CEF as a logging and auditing file format. CEF is aimed at normalizing the output produced by tools. Therefore, appliances and security and network applications can better interact, and information is collected and correlated with few efforts.

CEF is proposed as a text-based format and its main strengths lie on its ability to support a variety of device types and the ease to be extended. However, CEF neither regulates nor defines ID of the events produced by devices, something which must be accomplished by the device or application.

Log records syntax is made up of a standard prefix or header, containing both date and hostname:

Apr 19 13:17:53 message

and a variable extension grouped in key-value pairs, each pair separated by a pipe (“|”):

CEF:Version|Device Vendor|Device Product|Device Version|Signature ID|Name|Severity|Extension

Key-value pairs are defined as:

Version is an integer which identifies the CEF format. It defines the structure, i.e., the fields incorporated in the register.
Device vendor, product and version: string that identifies the device sending the report.
Signature ID: unique identifier for each event.
Name: a string which provides a description of the content of the event.
Severity field: an integer defining the level of criticality of an event. The range goes from 0 (low important event) to 10 (most critical event).
Extension: various pairs number-key.

CEF is encoded in UTF-8 format, so it recognizes spaces but requires backslash (“\”) to escape pipes (|), equals (=) and symbols such as (\n) and (\r). However, as an advantage, CEF format can be enforced on various kind of devices, including both on-premise and cloud.

Incident Object Description Exchange Format (IODEF)

IODEF is a computer security information sharing incident format where messages are presented in a human-readable way. It provides information about services running, networks and hosts. IODEF employs XML, a language which define a framework for data encoding. The aim of IODEF is to improve communication between entities (typically CSIRTs) by sharing data of incidents in a structured way. It provides a unified, common layout for information sharing. Normalizing security information eases while, at the same time, requires less resources to handle and process data concerning incidents.

However, IODEF has been designed as a transport model, so maybe its strength is not the storage of data. Despite its flexibility, it does not impose a standard incident format. Besides, IODEF is compatible with IDMEF (Intrusion Detection Message Exchange Format), a standard which works well with Intrusion Detection Systems (IDS).

Here is a list of the fields incorporated to IODEF:

Incident identification number or IncidentID. There is also an AlternativeID, which shall be employed different CSIRTs (not the one which defined the incident)
RelatedActivity, that is, IDs for related incidents.
DetectTime defines when was the incident noticed while StartTime and EndTime indicate when the incident starts and ends.
ReportTime or moment when the incident is reported.
Description and assessment of the event.
Method used to describe some techniques employed by the attacker.
Contact information of any group which may be engaged in the incident.
EventData, which keeps a description of the events related to the incident.
Log of the actions, that is, History: includes not only actions but events and noticeable things that happened on the course of the incident.

Structured Threat Information eXpression (STIX)

The objective of STIX is to define and share information of incidents and cyber threats. It can be described as a standard and structured language.

It is vital for companies to store updated information cyber threat intelligence, including vulnerabilities, threats, risks… Data can assist in the process of detecting attacks and proposing remediation strategies. STIX is commonly employed as a vehicle for intelligence sharing with third parties involved such as associates, partners or even providers. The main advantage of this behaviour of sharing information lies on the fact that it strengthens information net and provides analysts with tools to fight against incidents and recognize attack patterns.

The STIX architecture should then include various kinds of pieces of data such as cyber threat actors and initiatives, tactics, techniques, and procedures (TTPs).

One of the biggest advantages of STIX is that is transport-agnostic. The absence of a dependency on a concrete transport mechanism for its structure and serialization makes it flexible and adaptable. Besides, STIX mixes well with the Trusted Automated Exchange of Intelligence Information platform (TAXII), which has been specifically designed to transport STIX objects.

Other Options to normalize data

When standardising values located on a similar range, min-max normalization is a good choice. It is quite helpful when data values follow a uniform distribution, that is, there are just a few outliers and bounds are relatively close. Min-max normalization belongs to “scaling to a range” normalization techniques (Google Developers, 2020), a group of approaches where values are transformed into a more approachable range, usually between zero and one. An example of the usage of this technique is performed by Dark Web analysis software such as BiSAL (Al-Rowaily, et al., 2015).

A known variation of scaling techniques is z-score transformation. This approach focuses on deviations, that is, determining how a specific value differs from the mean. Main problem with z-score is that is best addressed to homogeneous datasets, that is, few outliers are allowed should the normalization technique shall be considered reliable.

On their behalf, logarithmic transformations help limiting datasets just in case they are populated with values with great variation. They perform a computation of the dataset log, which is the best approach where there exists just a bunch of values with a lot of points. They have been successfully tested regarding the normalization of alerts and events produced by probes of some critical infrastructures (Di Sarno, et al., 2016).

Another technique called square-root transformation deals with datasets where values are obtained from different sources, such as distinct tools. However, although it has proven to be effective when faced to Poison distributions (Barlett, 1936; Freeman & Tukey, 1950), it is worth mentioning that values under zero must be expressly handled, that is, added a variable to artificially raise the value above zero.

Although all described techniques are proven to be effective, extreme outliers could possibly minimize their stability and performance. A good approach is to rely on a feature clipping normalization technique, which allots outliers to a predefined, fixed value. It causes no harm since feature clipping can be performed either at the beginning or at the end of the normalization procedure. This kind of normalization can accurately predict incidents in activities on a network basis (Liu, et al., 2015).

Finally, Box-Cox normalization techniques (Box & Cox, 1964) demonstrated their effectiveness to identify optimal transformation for variables by just raising numbers to an exponent. They represent an alternative to be considered for data cleansing (Osborne, 2010) while, at the same time delivered higher performance compared to quantitative analysis employed by min-max or z-score techniques (Osborne, 2002).