Our IT Security and Risk (ITSR) team is almost invisible to the outside world, yet is quite influential amongst our technical teams. The ITSR team is constantly searching for patterns to improve our security posture, and proposes—and sometimes implements—approaches to ensure that REA Group’s infrastructure stays as secure as possible.
In this article we will uncover some “behind-the-scene” approaches implemented by the ITSR team as a proof-of-concept solution to showcase the technology.
In the modern world it is almost impossible to create an infrastructure purely based on the “in-house” solutions and products where every single line of source code was audited. Therefore, companies are relying on a myriad of third party components which source code is either inaccessible (e.g. proprietary software) or was not audited.
To make the matter worse the majority of vendors do not follow IT Security best practices and their products are usually using excessive sets of privileges. For example, instead of providing an SELinux policy module tailored for their application they simply state that the first installation step is to “disable SELinux”. This approach is clearly understandable from the business point of view (e.g. the support cost is lower), but is not acceptable from the security point of view.
So, what a company could do to mitigate the possible risk of compromise through the third party components? Well, we can apply the principle of least privilege to software we source from outside of the company and develop a procedure on how such software should be configured and/or deployed to ensure that when the component is compromised the impact would be as minimal as possible.
Since we do not have unlimited budget nor time we need to focus on software packages that are installed on the majority of the nodes in company’s infrastructure. This way we would get the biggest impact for a relatively small effort. For example, log aggregation software would be a good starting point to improve the security of the infrastructure: every single node is supposed to log its events and, usually, enterprises use third party aggregation/processing software to manage the log information in an efficient manner. To make it easier to illustrate, from now on we will focus on a single component of the log aggregation software which runs on all nodes in the network – a log collection agent.
Before any commitments are made we need to understand the inherent risk and the possible impact, determine whether we can accept the risk, and, if we cannot, try to figure out how the risk can be avoided or mitigated. Therefore, we need to critically assess the chosen application and determine whether the additional effort of improving application’s security is justified or not.
The first step would be to analyse the requirements for the product to function as expected, determine the possible attack vectors, and identify whether any excessive (unnecessary) privileges were provided to the software.
Below are the generic requirements for a log collection agent:
- collects user supplied, unstructured data;
- needs access to the log files to collect data;
- may perform some initial preprocessing of the collected data;
- sends collected data to the centralised server for processing;
- needs outbound network access to reach the log processing server.
Requirement 1 indicates that the application (the agent) can possibly be attacked through the maliciously crafted log messages, requirement 3 suggests that there is a possibility to take over the control of the application, and requirements 2, 4, and 5 provide enough access for the exfiltration of data.
Now, looking at the typical, vendor-defined installation of the chosen application we can see the following:
- all components run as root;
- the collection daemon has a management port and that port is listening on all interfaces available on the instance;
- there is no indication that the privilege separation is implemented in the agent, so the agent traverses untrusted directories and processes the collected data with the full set of privileges.
Items 1 and 3 reveal that using a bug in software an attacker would be able to get full access to the system the software is running on, while item 2 suggests that there is an additional attack vector for an attacker to exploit (through the management interface of the daemon process).
Looking at the information collected above we can state the following:
- There is a possibility to get full control over a given host in the network through the chosen software;
- There is a couple of attack vectors one could use to exploit the software, and while one is non-trivial (malicious log entries) the other is quite straightforward (an exposed management port) and could be used to propagate the attack further in the network;
- The defined requirements list does not indicate the need of the root privileges, so the application does use excessive privileges;
- The network management interface is unused in our usage scenario, hence it exposes an attack surface unnecessarily.
Taking into account that the software is running on literally every single node in the network it becomes quite obvious that tightening up the security of this component would bring considerable benefits to the overall security of the infrastructure.
The second step in our quest to improve the security of a 3rd party application would be to come up with a strategy that ensures that whatever we implement will satisfy the following:
- implement the least privilege principle to the acceptable depth (in this particular case, OS level discretionary access controls (DAC) enhanced by a bit of mandatory access controls (MAC) in the form of SELinux) for the application providing its components only with the privileges needed to satisfy the application requirements;
- treat the application as a black box and do not change the behaviour of the application (we should be able to install the official package on top of ours and it should still work as it should but without all the security-related enhancements);
- keep the maintenance cost as low as possible (we are not trying to fork and support that piece of software after all);
- try to contribute back to the vendor if the implemented feature is generic enough to be likely accepted upstream.
For the chosen application the following items were included into the strategy:
- re-package the vendor provided binary RPM package keeping the name of the package name and version in sync with the vendor, but add a local suffix to the release tag to track local, internal releases;
- preserve the directory structure where possible and introduce compatibility symbolic links where a relocation of a directory was required;
- ensure that the software starts and runs as a dedicated, non-privileged system account (to avoid a possible root compromise risk);
- reconfigure the package to bind the management port to the loopback interface only (to narrow down the attack surface);
- define a dedicated SELinux security domain and implement an application specific SELinux policy module that will ensure that the application is allowed access only to the resources it supposed to access;
- utilise the modern security access controls and grant the daemon read-only capability to source any file on the filesystem and restrict it to the log files only with the SELinux policy.
Finally, the third step is to implement the strategy technically. In our case the requirements are to ensure that the build process is incorporated into the continuous integration (CI) infrastructure, has a well defined update procedure (to update to the newly released upstream versions), and that the resulting artefacts (packages) are available for a range of the Linux distributions used at REA: CentOS 6, CentOS 7, and Amazon Linux.
There were some technical challenges along the implementation road related to the nature of the application and the way it was originally built by the vendor:
- The application was linked against a number of shared libraries which were built ad-hoc and were included into the package. We replaced the libraries with dependencies on the distribution provided libraries since it ensures that the security updates are applied on a timely manner;
- Some libraries the application was linked with had non-standard names (e.g. OpenSSL libraries were built with the default library names [libssl.so.1.0.0] where all major distributions agreed on a different naming convention for these libraries [libssl.so.10]). To address this we updated the ELF headers of the application with PatchELF;
- Some parts of the application were incorrectly linked against the development library names, e.g. the application was calling dlopen() for libssl.so. Since it would be very intrusive to binary fix this a solution was found to set the RPATH header field in the binaries to point to a private directory where all incorrect library names are symbolically linked to the correct libraries;
- Since we dropped privileges for the application and used Linux capabilities for a selected set of binaries (namely, we only needed the CAP_DAC_READ_SEARCH capability to allow the application to read any file on the filesystem and that was further restricted by the SELinux policy to ensure that only log files were exposed to the application), we discovered that the application was using the access() call before opening any files. Such a call would fail when the application runs as a non-privileged account with a capability. Therefore, we created a small shared library with a wrapper around the access() routine to call the stat() function instead, then we added the dependency on this library to the binaries with capabilities.
As the result of the work we did on the repackaging of the application we got a package for a system daemon that is inherently much more secure and even when a serious vulnerability is discovered in the application the impact would be pretty much minimised. The repackaging process is fully described by the corresponding RPM spec file and updating to a new version from upstream usually requires just changing the version tag in the spec file and rebuilding the package. The rebuild itself is performed by the CI server and the artefacts are published in a repository hosted in an AWS S3 bucket. That repository could be used by other teams in the company to leverage the enhanced security provided by the repackaged version of the log collection agent. Moreover, if for any reason there are some issues with the repackaging of the newer version of the application we still have an option of falling back to the official, upstream package: by doing this we will lose all the security enhancements, obviously, but it also means that teams are not bound to the customised version of the package.
The approach we implemented for the log collection agent is quite generic with the only part that would be significantly different for every application going through the process is the technical implementation of the selected strategy and can be applied to virtually any third party software distributed in the binary form.