Research
Recent Trends in Malicious Packages Targeting Discord
The Socket research team breaks down a sampling of malicious packages that download and execute files, among other suspicious behaviors, targeting the popular Discord platform.
Readme
Swipe through data once, with a comb that will pick up the points you're looking for.
Think single pass search.
Think k nearest neighbors, but brute force, but with multiple targets.
Extremely light weight: Pure python -- no other dependencies.
The main function of swipe
is highest_score_swipe
.
The other objects are there to support you in making your own kind of
swiping function.
What highest_score_swipe
does is
get the k items from the iterable it
who score the highest with score_of
.
(think of score_of
an inverse of key
argument of the sorted
python
function).
As far as the output is concerned, you can acheive something about the same with
highest_score_swipe(it, score_of, k, output)
as with
output(sorted(it, key=score_of, reverse=True)[:k])
(With slightly different output functions)
The difference is that in the last one,
But to get the top k
elements you don't have to. You just have to scan though
the data once while maintaining a list of the top items.
So when there's a lot of data, highest_score_swipe
will save you both memory and
computation.
>>> from swipe import highest_score_swipe
>>>
>>> data = [('Christian', 12), ('Seb', 88), ('Thor', 27), ('Sylvain', 42)]
Let's see what you get out of the box (i.e. only specifying what's required,
using defaults for all the rest).
We'll iter(data)
just to this once to show that data only has to be iterable.
>>> highest_score_swipe(iter(data))
[(('Thor', 27), ('Thor', 27))]
Now, out of the box, you don't get much, and looks a bit strange.
Reason is if you don't specify k
you just get the
top item, and if you don't specify what score should be used to measure the "top",
it'll just use python's default comparison operator which here brings
('Thor', 27)
on the top because it's lexicographically the last.
And why is ('Thor', 27)
repeated twice?
Because it acts both as a score (the first) and a data item (the second).
Where it becomes interesting (and useful) is when you specify what score function it should use. So let's.
>>> length_of_name = lambda x: len(x[0])
>>> by_age = lambda x: x[1]
>>> highest_score_swipe(data, by_age)
[(88, ('Seb', 88))]
>>> highest_score_swipe(data, length_of_name)
[(9, ('Christian', 12))]
>>> highest_score_swipe(data, length_of_name, k=2)
[(7, ('Sylvain', 42)), (9, ('Christian', 12))]
Now let's see about that output
argument.
It's used to specify how you want the result to be processed before returning.
>>> highest_score_swipe(data, length_of_name, k=2, output='top_tuples')
[(9, ('Christian', 12)), (7, ('Sylvain', 42))]
>>> highest_score_swipe(data, length_of_name, k=2, output='items')
[('Sylvain', 42), ('Christian', 12)]
>>> highest_score_swipe(data, length_of_name, k=2, output='scores')
[7, 9]
>>> highest_score_swipe(data, length_of_name, k=2, output='top_score_items')
[('Christian', 12), ('Sylvain', 42)]
You can also specify a custom function:
>>> highest_score_swipe(
... data, length_of_name, k=2,
... output=lambda km: [f"{name} (whose name has {score} letters), is {age}" for score, (name, age) in km]
... )
['Sylvain (whose name has 7 letters), is 42', 'Christian (whose name has 9 letters), is 12']
What if you wanted the indices (that is, the integer indexing the data) of the top 2 as your output? Here's a recipe for that:
>>> highest_score_swipe(
... enumerate(data), # enumerate the data to get a (i, item) iterator
... lambda x: length_of_name(x[1]), # apply your scoring function to the item
... k=2,
... output=lambda km: [x[1][0] for x in km] # extract the indices
... )
FAQs
Swipe through data once with a comb that will pick up the points you're looking for.
We found that swipe demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
The Socket research team breaks down a sampling of malicious packages that download and execute files, among other suspicious behaviors, targeting the popular Discord platform.
Security News
Socket CEO Feross Aboukhadijeh joins a16z partners to discuss how modern, sophisticated supply chain attacks require AI-driven defenses and explore the challenges and solutions in leveraging AI for threat detection early in the development life cycle.
Security News
NIST's new AI Risk Management Framework aims to enhance the security and reliability of generative AI systems and address the unique challenges of malicious AI exploits.