Share via


Allowlist libraries and init scripts on compute with standard access mode (formerly shared access mode)

In Databricks Runtime 13.3 LTS and above, the allowlist in Unity Catalog controls which libraries and init scripts can run on standard access mode compute. This allows users to leverage these artifacts on compute configured with standard access mode.

By default, the allowlist is empty. You cannot disable this feature. To modify the allowlist, you must be a metastore admin or have the MANAGE ALLOWLIST privilege. See MANAGE ALLOWLIST.

You can add a directory or file to the allowlist even if it hasn't been created yet. See Upload files to a Unity Catalog volume.

Important

Libraries used as JDBC drivers or custom Spark data sources on Unity Catalog-enabled standard compute require ANY FILE permissions.

Some installed libraries store data of all users in one common temp directory. These libraries might compromise user isolation.

Security and operational risks

Understanding the security implications of allowlists is critical for maintaining cluster isolation and protecting your data on standard access mode compute. Proper allowlist usage prevents users from adding arbitrary libraries and init scripts. This reduces the likelihood of security issues, cluster instability, and other unpredictable behavior.

Be deliberate about who receives MANAGE ALLOWLIST privileges. Users with MANAGE ALLOWLIST privileges can allowlist any path or Maven coordinate, effectively controlling what code can run on standard access mode compute.

As the metastore admin, periodically review items on the allowlist and verify that they come from trusted sources. Allowlisted artifacts can access cluster resources and user data, so they should be subject to the same security and governance controls as other sensitive components.

Databricks recommends these best practices for managing the allowlist:

  • Grant the MANAGE ALLOWLIST privilege only to metastore admins and trusted platform administrators. For other users, grant MANAGE ALLOWLIST only on a temporary, as-needed basis.
  • Review and audit allowlist additions regularly.
  • Use specific paths and Maven coordinates rather than broad patterns.
  • Configure storage locations for allowlisted artifacts with read-only permissions.
  • Implement a formal approval process for allowlist additions in production environments.
  • Test allowlisted libraries and init scripts in non-production environments before adding them to production allowlists.

How to add items to the allowlist

You can add items to the allowlist with Catalog Explorer or the REST API.

To open the dialog for adding items to the allowlist in Catalog Explorer, do the following:

  1. In your Azure Databricks workspace, click Data icon. Catalog.
  2. Click the gear icon Gear icon..
  3. Click the metastore name to open the metastore details and permissions UI.
  4. Select Allowed JARs/Init Scripts.
  5. Click Add.

Important

This option only displays for sufficiently privileged users. If you cannot access the allowlist UI, contact your metastore admin for assistance in allowlisting libraries and init scripts.

Add an init script to the allowlist

Complete the following steps in the allowlist dialog to add an init script to the allowlist:

  1. For Type, select Init Script.
  2. For Source Type, select Volume or the object storage protocol.
  3. Specify the source path to add to the allowlist. See How are permissions on paths enforced in the allowlist?.

Add a JAR to the allowlist

Complete the following steps in the allowlist dialog to add a JAR to the allowlist:

  1. For Type, select JAR.
  2. For Source Type, select Volume or the object storage protocol.
  3. Specify the source path to add to the allowlist. See How are permissions on paths enforced in the allowlist?.

Add Maven coordinates to the allowlist

Important

Before adding Maven coordinates to the allowlist, you must have CAN ATTACH TO and CAN MANAGE permissions set on the compute where you want to install the library. See Compute permissions.

Complete the following steps in the allowlist dialog to add Maven coordinates to the allowlist:

  1. For Type, select Maven.
  2. For Source Type, select Coordinates.
  3. Enter coordinates in the following format: groudId:artifactId:version.
    • You can include all versions of a library by allowlisting the following format: groudId:artifactId.
    • You can include all artifacts in a group by allowlisting the following format: groupId.

How are permissions on paths enforced in the allowlist?

You can use the allowlist to grant access to JARs or init scripts stored in Unity Catalog volumes and object storage. If you add a path for a directory rather than a file, allowlist permissions propagate to contained files and directories.

Prefix matching is used for all artifacts stored in Unity Catalog volumes or object storage. To prevent prefix matching at a given directory level, include a trailing slash (/). For example, /Volumes/prod-libraries/ will not perform prefix matching for files prefixed with prod-libraries. Instead, all files and directories within /Volumes/prod-libraries/ are added to the allowlist.

You can define permissions at the following levels:

  1. The base path for the volume or storage container.
  2. A directory nested at any depth from the base path.
  3. A single file.

Adding a path to the allowlist only means that the path can be used for either init scripts or JAR installation. Azure Databricks still checks for permissions to access data in the specified location.

The principal used must have READ VOLUME permissions on the specified volume. See READ VOLUME.

In dedicated access mode (formerly single user access mode), the identity of the assigned principal (a user or group) is used.

In standard access mode:

  • Libraries use the identity of the library installer.
  • Init scripts use the identity of the cluster owner.

Note

No-isolation shared access mode does not support volumes, but uses the same identity assignment as standard access mode.

Databricks recommends configuring all object storage privileges related to init scripts and libraries with read-only permissions. Users with write permissions on these locations can potentially modify code in library files or init scripts.

Databricks recommends using Microsoft Entra ID service principals to manage access to JARs or init scripts stored in Azure Data Lake Storage. Use the following linked documentation to complete this setup:

  1. Create a service principal with read and list permissions on your desired blobs. See Access storage using a service principal & Microsoft Entra ID(Azure Active Directory).

  2. Save your credentials using secrets. See Manage secrets.

  3. Set the properties in the Spark configuration and environmental variables while creating a cluster, as in the following example:

    Spark config:

    spark.hadoop.fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net OAuth
    spark.hadoop.fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
    spark.hadoop.fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net <application-id>
    spark.hadoop.fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net {{secrets/<secret-scope>/<service-credential-key>}}
    spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net https://login.microsoftonline.com/<tenant-id>/oauth2/token
    

    Environmental variables:

    SERVICE_CREDENTIAL={{secrets/<secret-scope>/<service-credential-key>}}
    
  4. (Optional) Refactor init scripts using azcopy or the Azure CLI.

    You can reference environmental variables set during cluster configuration within your init scripts to pass credentials stored as secrets for validation.

Note

Allowlist permissions for JARs and init scripts are managed separately. If you use the same location to store both types of objects, you must add the location to the allowlist for each.