Apache ZooKeeper is an open-source server which enables highly reliable distributed coordination. Distributed applications can use it to maintain configuration information, implement naming, provide synchronization and group services. There are numerous applications using ZooKeeper such as Hbase, Kafka, YARN, HDFS and Spark.
Metadata is stored in data objects named znodes. Services, by accessing ZooKeeper, create znode trees to save metadata and coordination information.
Controlling access to ZooKeeper information
ZooKeeper uses access control lists (ACLs) to control access to znodes. The ACL implementation is quite similar to UNIX file access permissions: it employs permission bits to allow/disallow various operations against a node and the scope to which the bits apply. An ACL only applies to a specific znode and not to its children.
ACLs are comprised of elements of the following format: "id:permissions"
Where id defines a client that is authenticated to the server using one of the supported pluggable authentication schemes  and permissions defines the permission bits expression combinations of the supported operations (CREATE, READ, DELETE, WRITE, ADMIN).
ACLs are set on creation of each znode and can be later altered. It is not safe to rely on services to correctly set ACLs in order to achieve maximum security. The fact that multiple services use the same ZooKeeper znode tree to store information raises security concerns.
ZooKeeper Security concerns
Anonymous client connections are in many cases required, for example when ZooKeeper is used as a discovery mechanism for services. Znodes without proper access permissions can be altered by anonymous users.
Even when anonymous connections are not required, it is not as straightforward to reject them. Before ZooKeeper 3.6.0 (released at 04/03/2020), security guides and sources proposed the requireClientAuthScheme as a solution to reject anonymous clients. However, this property was just a patch addition and never made it to the upstream codebase . After 3.6.0, sessionRequireClientSASLAuth can be used to accept connections and requests only from clients that have authenticated with the server via SASL.
By default, services such as Hbase and YARN do not set ACLs for their znodes. This means that they use the world:anyone:cdrwa scheme. Anonymous users can change data, delete or get ownership of znodes, accessing internal service information or even blocking services from accessing znodes.
Problematic scenario example: An anonymous user connects to the ZooKeeper ensemble, and changes YARN leader election znode (/yarn-leader-election) permissions so as to be accessible only by her, using the setACL command. YARN will not be able to elect a leader or set up a new node because the znode path will not be accessible for READ and CREATE.
Service access to ZooKeeper ensemble
Services operate on ZooKeeper using its exported API . In fact, they connect and operate exactly as clients do when using the command line interface. This introduces the aforementioned issue. It cannot be assured that services only expose znode operations on specific znodes, thus a service can act as an intermediate for changing other service’s znodes by an end user, providing another way to cause issues on them or to extract znode data.
Need for ACL policy auditing and enforcing
The previous issues signify the need for monitoring of the ACL policies that each service defines. We should also have a way to intervene and enforce secure ACL policies for each service. This is the motivation for the development of zkpolicy, a tool to audit and enforce ACL policies on ZooKeeper.
ZooKeeper Policy auditing tool - zkpolicy
zkpolicy is a tool that can be added in the arsenal of security and monitoring teams, by providing, inter alia, the following features:
- Querying the znode tree for nodes with specific characteristics (e.g znodes accessible by a specific client or completely open znodes).
- Definition of policies for widely used services as well as custom definitions using YAML files.
- Test execution for ACL policy compliance.
- Generation of audit reports with results from multiple tests and queries as well as general information about the ensemble (e.g. complete list of ACLs for the cluster znodes, Four Letter Word commands enabled ).
- Built in policies for various services, as defined by Cloudera/Hortonworks Best Practices .
- Enforcing ACL policies on znode subtrees.
The tool is implemented in Java and these features are available either from a command line interface or as a Maven dependency at the Central repository. Authentication with ZooKeeper is done using SASL, leveraging the JAAS Krb5LoginModule.
The tool is open sourced and available in https://github.com/cerndb/zkpolicy.
Usage scenarios - examples
Get the list of znodes with no ACL restrictions
zkpolicy query noACL --root-path / --list
Get the list of znodes that are accessible by a certain SASL authenticated client
zkpolicy query regexMatchACL --root-path / --args sasl:user1:.* --list
Generate an audit report
Generating audit reports requires passing the appropriate audit configuration file (more information on how to structure such a file can be found in zkpolicy configuration documentation) and executing the command below:
zkpolicy audit --input <audit_config> --output report.out
An audit report has the following format:
This report provides valuable information for the security state of ZooKeeper and can be even handed to security experts without providing direct access to ZooKeeper, by including the complete list of ACL definitions for the ensemble.
Enforce a service policy
The audit report may point out a service that is not following secure policies. In that case, we can patch this vulnerability by enforcing the correct policy, using the following command:
zkpolicy enforce --service-policy <service_name>
Enforce a custom, user defined policy
Policies may not always be relevant to a service but to general znode paths, like the root or the quote one. Enforcing custom defined policies
zkpolicy enforce --input <policy_definition_file>
It is advised to first execute policy enforcements in dry run mode, so as to get the list of the znodes to be affected without actually altering their ACLs. This can be done by adding the `--dry-run` option to the previous command.
Rolling back unwanted enforce
It is possible to enforce an incorrectly defined policy and for that reason the tool provides rollback functionality. Before every enforcement operation, a snapshot of the ACL state of the nodes to be affected is taken. Snapshots are saved in /opt/zkpolicy/rollback/ directory and can be later used to rollback by issuing the following command:
zkpolicy rollback --input <rollback_file>
Using zkpolicy, we audited the development cluster and managed to spot parts of the configuration that should be altered for hardening ZooKeeper security. We have thoroughly tested the policies on our Hadoop/Yarn/HBase clusters and continuing tests for the ZooKeeper on Kafka. Thanks to the tool and the tests we have done the following actions for strengthening the security of the services:
- We migrated ZooKeeper from Cloudera 3.4.5 to Apache 3.6.0 so as to support new security features.
- We hardened ACLs on HDFS and YARN and enabled SASL authentication between these services and ZooKeeper internally on our clusters.
- We disabled most of the Four Letter Words that were enabled by default.
- We decided to enable auth_to_local.
- We allowed only the superuser to access and alter all the znodes and by default all the users use SASL for authorization.
The tool was initially developed internally for IT-DB in the corresponding Gitlab repository (no public access)  and then open sourced by mirroring to a public Github repository .
More details of the tool can be found in the documentation or on the slides from the presentation given at the IT-DB technical meeting, 26-06-2020 .
We hope that you will find this tool useful for your Big Data services and that you will help us with the future developments of this product. Please let us know in the comments your so-far experience and share any ideas for the future of the tool.