Edge Python API
API to connect to dRISK Edge.
Useful Edge Links
Some useful links for new edge users:
Installation
pip install drisk_api
Basic Usage
The API supports the basic building blocs for Create/Read/Update/Delete operations on the graph. For example:
from drisk_api import GraphClient
token = "<edge_auth_token>"
new_graph = GraphClient.create_graph("a graph", token)
graph = GraphClient("graph_id", token)
node_id = graph.create_node(label="a node")
node = graph.get_node(node_id)
successors = graph.get_successors(node_id)
graph.update_node(node_id, label="new label", size=3)
other_id = graph.create_node(label="another node")
with graph.batch():
graph.create_edge(node_id, other, weight=5.)
More Examples
We can use these building blocks to create whatever graphs we are most interested in. Below are some examples:
Wikepedia Crawler
In this example we will scrape the main url links for a given wikipedia page and create a graph out of it.
Most of the code will be leveraging the wikipedia api and is not particularly important.
What is more interesting is how we can use the api
to convert the corresponding information into a graph to then explore it in edge.
First load the relevant module
import wikipedia
from wikipedia import PageError, DisambiguationError, search, WikipediaPage
from tqdm import tqdm
from drisk_api import GraphClient
Let's define some helper functions that will help us create a graph of wikipedia urls for a given page.
The main function to pay attention to is wiki_scraper
which will find the 'most important' links in a
given page and add them to the graph, linking back to the original page.
It will do this recursively for each node until a terminal condition is reached (e.g. a max recursion depth).
def find_page(title):
"""Find the wikipedia page."""
results, suggestion = search(title, results=1, suggestion=True)
try:
title = results[0] or suggestion
page = WikipediaPage(title, redirect=True, preload=False)
except IndexError:
raise PageError(title)
return page
def top_links(links, text, top_n):
"""Find most important links in a wikipedia page."""
link_occurrences = {}
for link in links:
link_occurrences[link] = text.lower().count(link.lower())
sorted_links = sorted(link_occurrences.items(), key=lambda x: x[1], reverse=True)
top_n_relevant_links = [link for link, count in sorted_links[:top_n]]
return top_n_relevant_links
def wiki_scraper(
graph,
page_node,
page_name,
string_cache,
visited_pages,
max_depth=3,
current_depth=0,
max_links=10,
first_depth_max_links=100,
):
try:
page = find_page(title=page_name)
except (DisambiguationError, PageError) as e:
return
graph.update_node(page_node, label=page_name, url=page.url)
if page_name in visited_pages or current_depth >= max_depth:
return
links = top_links(page.links, page.content, first_depth_max_links if current_depth == 0 else max_links)
if current_depth == 0:
tqdm_bar = tqdm(total=len(links), desc="wiki scraping")
for link in links:
if current_depth == 0:
tqdm_bar.update(1)
new_page_node = None
if link in string_cache:
new_page_node = string_cache[link]
else:
new_page_node = graph.create_node(label=link)
string_cache[link] = new_page_node
graph.create_edge(page_node, new_page_node, 1.)
wiki_scraper(
graph,
new_page_node,
link,
string_cache,
visted_pages,
current_depth=current_depth + 1,
max_links=max_links,
first_depth_max_links=first_depth_max_links,
)
visited_pages.add(page_name)
Then we can connect to our graph (or make one):
TOKEN = "<edge_auth_token>"
graph_id = "graph_id"
home_view = "view_id"
g = GraphClient(graph_id, TOKEN)
and run the scraper:
page_name = "Napoleon"
string_cache = {}
visted_pages = set()
page_node = g.create_node(label=page_name)
g.add_nodes_to_view(home_view, [page_node], [(0., 0.)])
with g.batch():
wiki_scraper(
g,
page_node,
page_name,
string_cache,
visted_pages,
max_depth=3,
current_depth=0,
max_links=3,
first_depth_max_links=2,
)
We can then head to edge to interact with the graph: