Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
message-accumulator
Advanced tools
A package to accumulate localizable snippets of HTML or JSX text and to compose and decompose them to/from localizable strings
A package to help transform localizable messages in a variety of syntaxes into a form that translators can easily translate without knowing anything about that syntax, and after translation, back again into a form that programs can easily use to recompose localized messages in the original syntax.
In HTML or JSX, for example, whole translatable messages are hard to identify. In HTML, some tags are commonly found inside of whole sentences, and some are not. What forms a whole translatable message?
Consider this snippet:
<div>
<span class="body">
There are <a href="http://url" title="localizable title">50 files</a> in the <span class="copyright">Simple Markdown</span> system.
</span>
</div>
In this case, the outer "div" and "span" tags are not part of any localizable snippet of text. The entire string "There are <a href="http://url" title="localizable title">50 files</a> in the <span class="copyright">Simple Markdown</span> system." should be localized as a single sentence because it would not make any sense to localize the parts "There are ", "50 files", " in the ", "Simple Markdown", and " system." separately. They are not simply phrases that you can translate out-of-context, and then re-concatenate and have any hope that it will make logical sense in many other languages. Human language is more complicated than that! In order for translators to do a good job, they need the entire sentence.
The only problem is that translators are not so good with programming language syntax and tend to do things like occasionally translating HTML tag names or attribute values. For example, they may see a series of CSS class names that forms a short phrase in English, and decide that they should be translated. The situation is even worse in JSX because the names of tags are not a fixed list like HTML. Components can have any name and even translators who are familiar with HTML tags are confused as to what is translatable and what is not. In our example above, we even have another added complication that the value of the "title" attribute of the "a" tag is actual localizable text but the value of the other attributes are not, which is even more confusing to the translators. Again, they are not programmers.
But that is okay. They are amazing linguists, and this library helps to hide these complications from them. This library hides the contents of such tags from the translators and lets them translate with minimal syntax getting in the way. The sentence above would be easier for translators to translate if it were something like this:
There are <c0>50 files</c0> in the <c1>Simple Markdown</c1> system.
where:
c0 = <a href="http://url" title="localizable title">
c1 = <span class="copyright">
Translators can learn this simple XML syntax quickly, and don't need to know the intracacies of any programming or markup language. They can focus on the linguistic part of the translation and only have to make sure that the corresponding portion of translation is surrounded by the XML-tags "c0" and "c1". (The "c" stands for the word "component" -- XML tags have to start with a letter.)
Translating this type of message has many advantages:
Now in many human languages, grammar is different than in English, so it is entirely possible that the order of the components turns out different for a translated string than in English. Also, the nesting of those components may change. We need to allow the translators the freedom to do what is right for the grammar of their target language. That means we need to be able to decompose a translated string back into a tree of syntax nodes that can easily be transformed back into the source programming language again, whether that is HTML, JSX, or even Markdown. This is accomplished by reapplying the mapping between the components and the original tag text to the appropriate parts of the translation.
Consider this translation of our example above into German:
In den <c1>Simple Markdown</c1> System, gibt es <c0>50 Dateien</c1>.
Note that the order of the components is indeed reversed from English -- c1 comes before c0. Ideally, we would like to decompose this into this tree:
root
"In den "
c1
"Simple Markdown"
"System, gibt es"
c0
"50 Dateien"
"."
From there you can easily reapply the mapping c1 = <span class="copyright">
and c0 = <a href="http://url" title="localizable title">
, plus the appropriate
close tags of course, to reconstruct the HTML into nicely translated HTML:
In den <span class="copyright">Simple Markdown</span> System, gibt es <a href="http://url" title="localizable title">50 Dateien</a >.
In many cases, the caller of this message accumulator class will have an abstract syntax tree (AST) in memory which is the result of parsing the original English source file with a standard parser. In this case, "c0" and "c1" would map to particular nodes in that tree instead of to snippets of text containing the HTML tags. The caller's AST can be modified for the translation by reusing existing AST nodes in the place of the components.
The goal of the message-accumulator class is to help the caller accumulate a localizable unit (a "message") while traversing the AST of the source file, as well as to be able to decompose a translated string back into a tree that can be easily transformed into AST nodes again.
The MessageAccumulator class has four main methods to accumulate a string:
Step 1. Use your parser to generate an abstract syntax tree (AST) that represents the file.
Step 2. Walk the AST, accumulating text as appropriate using addText and pushing contexts for any nodes that do not mark a break in the text. For example, if your HTML parser has some text followed by the "b" tag, then that "b" tag should not cause a break in the text. The code should push a new context and continue to accumulate more text.
Step 3. At some point, the parser will eventually come to a node in the AST that marks a break in the translatable message. (Or it will come to the end of the file!) For example in HTML, you might encounter a <div> tag. When this happens, the current value of the message accumulator is the translatable string. The code can retrieve the string using the getString method, and this string can be sent into the localization process. Typical the code will then create a new MessageAccumulator instance for the next piece of text.
At some point, the translations of all the strings will be done, and the localized file can be reconstructed.
To do this, the code starts with the source file and the set of translations from your localization system, in the form of resource files or a translation server.
Step 1. The source file is reparsed and re-walked as above, but this time, you keep track of nodes in the AST by pushing them into your contexts. This decorates the nodes in the accumulator with the AST nodes. Doing this creates a mapping between contexts and the AST nodes that they represent. The push() method takes a parameter that is your AST node.
Step 2. As the code walks the nodes and hit some node that causes the end of the translatable text, it can then apply the translation. The result of getString() gives the translatable source, and the translation of that is looked up in the translation system. Then, the code will create a new MessageAccumulator from that translated string plus the current MessageAccumulator containing the source string. This will apply the mapping from context to AST node appropriately to the translated MessageAccumulator.
Step 3. Walk the new translated MessageAccumulator again, converting the MessageAccumulator nodes into AST nodes. Then, replace the AST nodes with these new ones.
Step 4. When all of the text is translated, convert the AST back into text again to reconstruct your translated file.
FAQs
A package to accumulate localizable snippets of HTML or JSX text and to compose and decompose them to/from localizable strings
The npm package message-accumulator receives a total of 4,752 weekly downloads. As such, message-accumulator popularity was classified as popular.
We found that message-accumulator demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.