Security News
Research
Supply Chain Attack on Rspack npm Packages Injects Cryptojacking Malware
A supply chain attack on Rspack's npm packages injected cryptomining malware, potentially impacting thousands of developers.
@azure/storage-file-datalake
Advanced tools
Microsoft Azure Storage SDK for JavaScript - DataLake
Azure Data Lake Storage (ADLS) includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics.
This project provides a client library in JavaScript that makes it easy to consume Microsoft Azure Storage Data Lake service.
Use the client libraries in this package to:
Source code | Package (npm) | API Reference Documentation | Product documentation | Samples | Azure Storage Data Lake REST APIs
Prerequisites: You must have an Azure subscription and a Storage Account to use this package. If you are using this package in a Node.js application, then Node.js version 8.0.0 or higher is required.
The preferred way to install the Azure Storage Data Lake client library for JavaScript is to use the npm package manager. Type the following into a terminal window:
npm install @azure/storage-file-datalake
Azure Storage supports several ways to authenticate. In order to interact with the Azure Data Lake Storage service you'll need to create an instance of a Storage client - DataLakeServiceClient
, DataLakeFileSystemClient
, or DataLakePathClient
for example. See samples for creating the DataLakeServiceClient
to learn more about authentication.
The Azure Data Lake Storage service supports the use of Azure Active Directory to authenticate requests to its APIs. The @azure/identity
package provides a variety of credential types that your application can use to do this. Please see the README for @azure/identity
for more details and samples to get you started.
This library is compatible with Node.js and browsers, and validated against LTS Node.js versions (>=8.16.0) and latest versions of Chrome, Firefox and Edge.
You need polyfills to make this library work with IE11. The easiest way is to use @babel/polyfill, or polyfill service.
You can also load separate polyfills for missed ES feature(s). This library depends on following ES features which need external polyfills loaded.
Promise
String.prototype.startsWith
String.prototype.endsWith
String.prototype.repeat
String.prototype.includes
Array.prototype.includes
Object.assign
Object.keys
(Overrides the IE11's Object.keys
with a polyfill to enable the ES6 behavior)Symbol
Symbol.iterator
There are differences between Node.js and browsers runtime. When getting started with this library, pay attention to APIs or classes marked with "ONLY AVAILABLE IN NODE.JS RUNTIME" or "ONLY AVAILABLE IN BROWSERS".
gzip
or deflate
format and its content encoding is set accordingly, downloading behavior is different between Node.js and browsers. In Node.js storage clients will download the file in its compressed format, while in browsers the data will be downloaded in de-compressed format.StorageSharedKeyCredential
generateAccountSASQueryParameters()
generateDataLakeSASQueryParameters()
DataLakeFileClient.upload()
is available in both Node.js and browsers.
DataLakeFileClient.uploadFile()
DataLakeFileClient.uploadStream()
DataLakeFileClient.readToBuffer()
DataLakeFileClient.readToFile()
To use this client library in the browser, first you need to use a bundler. For details on how to do this, please refer to our bundling documentation.
You need to set up Cross-Origin Resource Sharing (CORS) rules for your storage account if you need to develop for browsers. Go to Azure portal and Azure Storage Explorer, find your storage account, create new CORS rules for blob/queue/file/table service(s).
For example, you can create following CORS settings for debugging. But please customize the settings carefully according to your requirements in production environment.
Notice: Data Lake currently shares CORS settings for blob service.
Azure Data Lake Storage Gen2 was designed to:
Key Features of DataLake Storage Gen2 include:
A fundamental part of Data Lake Storage Gen2 is the addition of a hierarchical namespace to Blob storage. The hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access.
In the past, cloud-based analytics had to compromise in areas of performance, management, and security. Data Lake Storage Gen2 addresses each of these aspects in the following ways:
Data Lake storage offers three types of resources:
DataLakeServiceClient
DataLakeFileSystemClient
DataLakeDirectoryClient
or DataLakeFileClient
Azure DataLake Gen2 | Blob |
---|---|
Filesystem | Container |
Path (File or Directory) | Blob |
Note: This client library only supports storage accounts with hierarchical namespace (HNS) enabled.
To use the clients, import the package into your file:
const AzureStorageDataLake = require("@azure/storage-file-datalake");
Alternatively, selectively import only the types you need:
const {
DataLakeServiceClient,
StorageSharedKeyCredential
} = require("@azure/storage-file-datalake");
The DataLakeServiceClient
requires an URL to the data lake service and an access credential. It also optionally accepts some settings in the options
parameter.
DefaultAzureCredential
from @azure/identity
packageRecommended way to instantiate a DataLakeServiceClient
Notice. Azure Data Lake currently reuses blob related roles like "Storage Blob Data Owner" during following AAD OAuth authentication.
Setup : Reference - Authorize access to blobs (data lake) and queues with Azure Active Directory from a client application - https://docs.microsoft.com/azure/storage/common/storage-auth-aad-app
Register a new AAD application and give permissions to access Azure Storage on behalf of the signed-in user.
API permissions
section, select Add a permission
and choose Microsoft APIs
.Azure Storage
and select the checkbox next to user_impersonation
and then click Add permissions
. This would allow the application to access Azure Storage on behalf of the signed-in user.Grant access to Azure Data Lake data with RBAC in the Azure Portal
Access control (IAM)
tab (in the left-side-navbar of your storage account in the azure-portal).Environment setup for the sample
CLIENT ID
and TENANT ID
. In the "Certificates & Secrets" tab, create a secret and note that down.const { DefaultAzureCredential } = require("@azure/identity");
const { DataLakeServiceClient } = require("@azure/storage-file-datalake");
// Enter your storage account name
const account = "<account>";
const defaultAzureCredential = new DefaultAzureCredential();
const datalakeServiceClient = new DataLakeServiceClient(
`https://${account}.dfs.core.windows.net`,
defaultAzureCredential
);
See the Azure AD Auth sample for a complete example using this method.
[Note - Above steps are only for Node.js]
Alternatively, you can instantiate a DataLakeServiceClient
using the fromConnectionString()
static method with the full connection string as the argument. (The connection string can be obtained from the azure portal.)
[ONLY AVAILABLE IN NODE.JS RUNTIME]
const { DataLakeServiceClient } = require("@azure/storage-file-datalake");
const connStr = "<connection string>";
const DataLakeServiceClient = DataLakeServiceClient.fromConnectionString(connStr);
StorageSharedKeyCredential
Alternatively, you instantiate a DataLakeServiceClient
with a StorageSharedKeyCredential
by passing account-name and account-key as arguments. (The account-name and account-key can be obtained from the azure portal.)
[ONLY AVAILABLE IN NODE.JS RUNTIME]
const {
DataLakeServiceClient,
StorageSharedKeyCredential
} = require("@azure/storage-file-datalake");
// Enter your storage account name and shared key
const account = "<account>";
const accountKey = "<accountkey>";
// Use StorageSharedKeyCredential with storage account and account key
// StorageSharedKeyCredential is only available in Node.js runtime, not in browsers
const sharedKeyCredential = new StorageSharedKeyCredential(account, accountKey);
const datalakeServiceClient = new DataLakeServiceClient(
`https://${account}.dfs.core.windows.net`,
sharedKeyCredential
);
Also, You can instantiate a DataLakeServiceClient
with a shared access signatures (SAS). You can get the SAS token from the Azure Portal or generate one using generateAccountSASQueryParameters()
.
const { DataLakeServiceClient } = require("@azure/storage-file-datalake");
const account = "<account name>";
const sas = "<service Shared Access Signature Token>";
const serviceClientWithSAS = new DataLakeServiceClient(
`https://${account}.dfs.core.windows.net${sas}`
);
Use DataLakeServiceClient.getFileSystemClient()
to get a file system client instance then create a new file system resource.
const { DefaultAzureCredential } = require("@azure/identity");
const { DataLakeServiceClient } = require("@azure/storage-file-datalake");
const account = "<account>";
const defaultAzureCredential = new DefaultAzureCredential();
const datalakeServiceClient = new DataLakeServiceClient(
`https://${account}.dfs.core.windows.net`,
defaultAzureCredential
);
async function main() {
// Create a file system
const fileSystemName = `newfilesystem${new Date().getTime()}`;
const fileSystemClient = datalakeServiceClient.getFileSystemClient(fileSystemName);
const createResponse = await fileSystemClient.create();
console.log(`Create file system ${fileSystemName} successfully`, createResponse.requestId);
}
main();
Use DataLakeServiceClient.listFileSystems()
function to iterate the file systems,
with the new for-await-of
syntax:
const { DefaultAzureCredential } = require("@azure/identity");
const { DataLakeServiceClient } = require("@azure/storage-file-datalake");
const account = "<account>";
const defaultAzureCredential = new DefaultAzureCredential();
const datalakeServiceClient = new DataLakeServiceClient(
`https://${account}.dfs.core.windows.net`,
defaultAzureCredential
);
async function main() {
let i = 1;
let fileSystems = datalakeServiceClient.listFileSystems();
for await (const fileSystem of fileSystems) {
console.log(`File system ${i++}: ${fileSystem.name}`);
}
}
main();
Alternatively without using for-await-of
:
const { DefaultAzureCredential } = require("@azure/identity");
const { DataLakeServiceClient } = require("@azure/storage-file-datalake");
const account = "<account>";
const defaultAzureCredential = new DefaultAzureCredential();
const datalakeServiceClient = new DataLakeServiceClient(
`https://${account}.dfs.core.windows.net`,
defaultAzureCredential
);
async function main() {
let i = 1;
let iter = datalakeServiceClient.listFileSystems();
let fileSystemItem = await iter.next();
while (!fileSystemItem.done) {
console.log(`File System ${i++}: ${fileSystemItem.value.name}`);
fileSystemItem = await iter.next();
}
}
main();
In addition, pagination is supported for listing too via byPage()
:
const { DefaultAzureCredential } = require("@azure/identity");
const { DataLakeServiceClient } = require("@azure/storage-file-datalake");
const account = "<account>";
const defaultAzureCredential = new DefaultAzureCredential();
const datalakeServiceClient = new DataLakeServiceClient(
`https://${account}.dfs.core.windows.net`,
defaultAzureCredential
);
async function main() {
let i = 1;
for await (const response of datalakeServiceClient
.listFileSystems()
.byPage({ maxPageSize: 20 })) {
if (response.fileSystemItems) {
for (const fileSystem of response.fileSystemItems) {
console.log(`File System ${i++}: ${fileSystem.name}`);
}
}
}
}
main();
const { DefaultAzureCredential } = require("@azure/identity");
const { DataLakeServiceClient } = require("@azure/storage-file-datalake");
const account = "<account>";
const defaultAzureCredential = new DefaultAzureCredential();
const datalakeServiceClient = new DataLakeServiceClient(
`https://${account}.dfs.core.windows.net`,
defaultAzureCredential
);
const fileSystemName = "<file system name>";
async function main() {
const fileSystemClient = datalakeServiceClient.getFileSystemClient(fileSystemName);
const directoryClient = fileSystemClient.getDirectoryClient("directory");
await directoryClient.create();
await directoryClient.delete();
}
main();
const { DefaultAzureCredential } = require("@azure/identity");
const { DataLakeServiceClient } = require("@azure/storage-file-datalake");
const account = "<account>";
const defaultAzureCredential = new DefaultAzureCredential();
const datalakeServiceClient = new DataLakeServiceClient(
`https://${account}.dfs.core.windows.net`,
defaultAzureCredential
);
const fileSystemName = "<file system name>";
async function main() {
const fileSystemClient = datalakeServiceClient.getFileSystemClient(fileSystemName);
const content = "Hello world!";
const fileName = "newfile" + new Date().getTime();
const fileClient = fileSystemClient.getFileClient(fileName);
await fileClient.create();
await fileClient.append(content, 0, content.length);
await fileClient.flush(content.length);
console.log(`Create and upload file ${fileName} successfully`);
}
main();
Similar to listing file systems.
const { DefaultAzureCredential } = require("@azure/identity");
const { DataLakeServiceClient } = require("@azure/storage-file-datalake");
const account = "<account>";
const defaultAzureCredential = new DefaultAzureCredential();
const datalakeServiceClient = new DataLakeServiceClient(
`https://${account}.dfs.core.windows.net`,
defaultAzureCredential
);
const fileSystemName = "<file system name>";
async function main() {
const fileSystemClient = datalakeServiceClient.getFileSystemClient(fileSystemName);
let i = 1;
let paths = fileSystemClient.listPaths();
for await (const path of paths) {
console.log(`Path ${i++}: ${path.name}, is directory: ${path.isDirectory}`);
}
}
main();
const { DefaultAzureCredential } = require("@azure/identity");
const { DataLakeServiceClient } = require("@azure/storage-file-datalake");
const account = "<account>";
const defaultAzureCredential = new DefaultAzureCredential();
const datalakeServiceClient = new DataLakeServiceClient(
`https://${account}.dfs.core.windows.net`,
defaultAzureCredential
);
const fileSystemName = "<file system name>";
const fileName = "<file name>";
async function main() {
const fileSystemClient = datalakeServiceClient.getFileSystemClient(fileSystemName);
const fileClient = fileSystemClient.getFileClient(fileName);
// Get file content from position 0 to the end
// In Node.js, get downloaded data by accessing downloadResponse.readableStreamBody
const downloadResponse = await fileClient.read();
const downloaded = await streamToBuffer(downloadResponse.readableStreamBody);
console.log("Downloaded file content:", downloaded.toString());
// [Node.js only] A helper method used to read a Node.js readable stream into a Buffer.
async function streamToBuffer(readableStream) {
return new Promise((resolve, reject) => {
const chunks = [];
readableStream.on("data", (data) => {
chunks.push(data instanceof Buffer ? data : Buffer.from(data));
});
readableStream.on("end", () => {
resolve(Buffer.concat(chunks));
});
readableStream.on("error", reject);
});
}
}
main();
const { DefaultAzureCredential } = require("@azure/identity");
const { DataLakeServiceClient } = require("@azure/storage-file-datalake");
const account = "<account>";
const defaultAzureCredential = new DefaultAzureCredential();
const datalakeServiceClient = new DataLakeServiceClient(
`https://${account}.dfs.core.windows.net`,
defaultAzureCredential
);
const fileSystemName = "<file system name>";
const fileName = "<file name>"
async function main() {
const fileSystemClient = datalakeServiceClient.getFileSystemClient(fileSystemName);
const fileClient = fileSystemClient.getFileClient(fileName);
// Get file content from position 0 to the end
// In browsers, get downloaded data by accessing downloadResponse.contentAsBlob
const downloadResponse = await fileClient.read();
const downloaded = await blobToString(await downloadResponse.contentAsBlob);
console.log(
"Downloaded file content",
downloaded
);
// [Browsers only] A helper method used to convert a browser Blob into string.
async function blobToString(blob: Blob): Promise<string> {
const fileReader = new FileReader();
return new Promise<string>((resolve, reject) => {
fileReader.onloadend = (ev: any) => {
resolve(ev.target!.result);
};
fileReader.onerror = reject;
fileReader.readAsText(blob);
});
}
}
main();
Enabling logging may help uncover useful information about failures. In order to see a log of HTTP requests and responses, set the AZURE_LOG_LEVEL
environment variable to info
. Alternatively, logging can be enabled at runtime by calling setLogLevel
in the @azure/logger
:
import { setLogLevel } from "@azure/logger";
setLogLevel("info");
More code samples:
If you'd like to contribute to this library, please read the contributing guide to learn more about how to build and test the code.
FAQs
Microsoft Azure Storage SDK for JavaScript - DataLake
The npm package @azure/storage-file-datalake receives a total of 24,953 weekly downloads. As such, @azure/storage-file-datalake popularity was classified as popular.
We found that @azure/storage-file-datalake demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
A supply chain attack on Rspack's npm packages injected cryptomining malware, potentially impacting thousands of developers.
Research
Security News
Socket researchers discovered a malware campaign on npm delivering the Skuld infostealer via typosquatted packages, exposing sensitive data.
Security News
Sonar’s acquisition of Tidelift highlights a growing industry shift toward sustainable open source funding, addressing maintainer burnout and critical software dependencies.