Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
The Apify-Haystack integration allows easy interaction between the Apify platform and Haystack.
Apify is a platform for web scraping, data extraction, and web automation tasks. It provides serverless applications called Actors for different tasks, like crawling websites, and scraping Facebook, Instagram, and Google results, etc.
Haystack offers an ecosystem of tools for building, managing, and deploying search engines and LLM applications.
Apify-haystack is available at the apify-haystack
PyPI package.
pip install apify-haystack
You need to have an Apify account and API token to run this example. You can start with a free account at Apify and get your API token.
In the example below, specify apify_api_token
and run the script:
from dotenv import load_dotenv
from haystack import Document
from apify_haystack import ApifyDatasetFromActorCall
# Set APIFY_API_TOKEN here or load it from .env file
apify_api_token = "" or load_dotenv()
actor_id = "apify/website-content-crawler"
run_input = {
"maxCrawlPages": 3, # limit the number of pages to crawl
"startUrls": [{"url": "https://haystack.deepset.ai/"}],
}
def dataset_mapping_function(dataset_item: dict) -> Document:
return Document(content=dataset_item.get("text"), meta={"url": dataset_item.get("url")})
actor = ApifyDatasetFromActorCall(
actor_id=actor_id, run_input=run_input, dataset_mapping_function=dataset_mapping_function
)
print(f"Calling the Apify actor {actor_id} ... crawling will take some time ...")
print("You can monitor the progress at: https://console.apify.com/actors/runs")
dataset = actor.run().get("documents")
print(f"Loaded {len(dataset)} documents from the Apify Actor {actor_id}:")
for d in dataset:
print(d)
See other examples in the examples directory for more examples, here is a list of few of them
InMemoryDocumentStore
If you find any bug or issue, please submit an issue on GitHub. For questions, you can ask on Stack Overflow, in GitHub Discussions or you can join our Discord server.
Your code contributions are welcome. If you have any ideas for improvements, either submit an issue or create a pull request. For contribution guidelines and the code of conduct, see CONTRIBUTING.md.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
FAQs
Apify-haystack integration
We found that apify-haystack demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.