Overview of SharePoint ingestion setup

Learn about the supported authentication methods for SharePoint ingestion into Azure Databricks.

Important

The managed SharePoint connector is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Azure Databricks previews.

Tip

This page covers the managed SharePoint connector for ingesting unstructured files (PDFs, DOCX, and more) for use in applications such as RAG.

To build custom pipelines with the SharePoint connector, providing full control over parsing, transformations, and ingestion of both structured files (for example, CSV and Excel) and unstructured files into Delta tables, see Ingest files from SharePoint.

Choose your SharePoint connector

Lakeflow Connect offers two complementary SharePoint connectors. They both access data in SharePoint, but they support distinct goals.

Consideration	Managed SharePoint connector	Standard SharePoint connector
Management and customization	A fully-managed connector. Simple, low-maintenance connectors for enterprise applications that ingest data in to Delta tables and keep them in sync with the source. See Managed connectors in Lakeflow Connect.	Build custom ingestion pipelines with SQL, PySpark, or Lakeflow Spark Declarative Pipelines using batch and streaming APIs such as `read_files`, `spark.read`, `COPY INTO`, and Auto Loader. Offers the flexibility to perform complex transformations during ingestion, while giving you greater responsibility for managing and maintaining your pipelines.
Output format	Uniform binary content table. Ingests each file in binary format (one file per row), along with file metadata in additional columns.	Structured Delta tables. Ingests structured files (like CSV and Excel) as Delta tables. Can also be used to ingest unstructured files in binary format.
Granularity, filtering, and selection	No subfolder or file level selection today. No pattern-based filtering. Ingests all files in the specified SharePoint document library.	Granular and custom. URL-based selection to ingest from document libraries, subfolders, or individual files. Also supports pattern-based filtering using the `pathGlobFilter` option.

Which authentication methods are supported?

The SharePoint connector supports the following authentication methods:

Which authentication method should I choose?

In most scenarios, Databricks recommends machine-to-machine (M2M) OAuth. M2M scopes connector permissions to a specific site. However, if you want to scope permissions to whatever the authenticating user can access, choose user-to-machine (U2M) OAuth instead. Both methods offer automated token refresh and heightened security.

Manual token refresh authentication is considered a legacy method and is not recommended.

U2M compared to M2M

The following table compares U2M and M2M for authentication to SharePoint:

Feature	OAuth U2M	OAuth M2M
Authentication type	Delegated access (user-based)	App-only access (service principal)
User interaction required	Yes - User must sign in	No - Fully automated
Best for	User-specific access scenarios	Automated production pipelines
Token refresh	Handled automatically by Azure Databricks	Handled automatically by Azure Databricks
SharePoint permissions	Delegated permissions	Application permissions
Access scope	Limited to user's permissions	Defined by app registration

Feedback

Was this page helpful?

Last updated on 2025-12-11