Transifex Native is a full end-to-end, cloud-based localization stack for moderns apps.
Transifex Native SDK: i18n DOM library
A utility library for managing the localization of generic HTML documents or fragments.
Taking as input a document
object it:
- Applies string segmentation
- Extracts strings to be pushed to a Transifex Native project
- Can be combined with
@transifex/native
to localize HTML
Related packages:
Learn more about Transifex Native in the Transifex Developer Hub.
Quick starting guide
Install the library using:
npm install @transifex/dom --save
Webpack
import { TxNativeDOM } from '@transifex/dom';
const txdom = new TxNativeDOM();
Node.js
const { TxNativeDOM } = require('@transifex/dom');
const txdom = new TxNativeDOM();
Browser
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/@transifex/dom/dist/browser.dom.min.js"></script>
<script type="text/javascript">
const TxNativeDOM = TransifexDOM.TxNativeDOM;
const txdom = new TxNativeDOM();
</script>
API
Initialize
By default, TxNativeDOM initializes with some sane defaults, that affect the
way strings are detected in the HTML, such tags or classes to ignore, or which
attributes to treat as text.
The contructor supports some additional customization.
const txdom = new TxNativeDOM({
ignoreTags: Array(String),
ignoreClass: Array(string),
parseAttr: Array(String),
variablesParser: Function,
});
Attach / detach DOM
Connect or disconnect a DOM with the library.
txdom.attachDOM(document);
txdom.attachDOM(document, root);
txdom.detachDOM(document);
txdom.detachDOM(root);
Translate DOM
Translate DOM using a t
function or reset to source language.
let locale = 'fr';
const t = (key) => {
return 'translation';
}
txdom.toLanguage(locale, t);
txdom.toSource();
Pseudo translate DOM
For debugging purpose you may use the built-in pseudo translation mode.
txdom.pseudoTranslate();
txdom.toSource();
Get source strings
Get a list of detected strings for localization. The JSON format is compatible with Transifex Native.
txdom.getStringsJSON();
To append some tags to all exported strings do:
txdom.getStringsJSON({
tags: ['global-tag1', 'global-tag2'],
});
To add occurence information to all exported strings do:
txdom.getStringsJSON({
occurrences: ['file.js', 'https://example.com/home'],
});
Use cases
Transifex Native DOM works in the browser using window.document
DOM or within
NodeJS using a DOM emulator, such as jsdom,
happy-dom and
linkedom.
The following examples are documented using jsdom
, but a document
node is all required.
const { JSDOM } = require('jsdom');
const { createNativeInstance } = require('@transifex/native');
const { TxNativeDOM } = require('@transifex/dom');
const tx = createNativeInstance({
token: 'token',
secret: 'secret',
});
const txdom = new TxNativeDOM();
const jsdom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);
txdom.attachDOM(jsdom.window.document);
const stringsJSON = txdom.getStringsJSON();
await tx.pushSource(stringsJSON);
Translate HTML document
const { JSDOM } = require('jsdom');
const { createNativeInstance } = require('@transifex/native');
const { TxNativeDOM } = require('@transifex/dom');
const tx = createNativeInstance({
token: 'token',
});
const txdom = new TxNativeDOM();
const jsdom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);
txdom.attachDOM(jsdom.window.document);
await tx.setCurrentLocale('fr');
txdom.toLanguage(tx.getCurrentLocale(), (key) => {
return tx.cache.get(key, tx.getCurrentLocale());
});
const translatedHTML = jsdom.serialize();
How string segmentation works
This is how the HTML content is segmented:
Block HTML tags
Example of block tags are: DIV
, P
, H1
, TABLE
, UL
, OL
etc.
When the content of a block tag is a combination of plain text and inline elements such as SPAN
, all the content is considered a single segment.
HTML:
<div>
<p>This is a paragraph</p>
<p>This is a paragraph with <span>inline element</span></p>
<div>
Segments:
"This is a paragraph"
"This is a paragraph with <span>inline element</span>"
Plain text
When the content of a block tag is NOT a combination of plain text and a tag, only the plain text content is extracted.
HTML:
<div>
<p>
<span>My span text</span>
<span>Another span text</span>
</p>
</div>
Segments:
"My span text"
"Another span text"
CSS Data Binding On The Angular or React Framework
CSS styles may also be used for data binding on the Angular or React framework. The DOM model is used to decipher the Angular or React framework. This entails that the text inside the Inline Element, is directly controlled by the Angular/React framework, as opposed to being modified through HTML (e.g. in the case of jQuery). Because of this, when a block tag is a combination of plain text and inline elements such as SPAN
that use a data binding based. CSS style that is based on the Angular/React framework, the text attribute needs to be evaluated separately. This results in the creation of multiple segments.
HTML:
<div>
<p>This is a paragraph</p>
<p>This is a paragraph with <span class="AngularReact">an inline element</span></p>
<div>
Segments:
"This is a paragraph"
"This is a paragraph with"
"an inline element"
Page title
HTML:
<title>My title</title>
Segments:
"My title"
Anchor titles
HTML:
<a title="My title">..</a>
Segments:
"My title"
Image titles and alt text
HTML:
<img title="My title" alt="My alt text"/>
Segments:
"My title"
"My alt text"
Input values and placeholders
Input values are only detected for inputs with type button, reset, submit.
Textarea placeholders
HTML:
<textarea placeholder="My placeholder text">
Segments:
"My placeholder text"
Meta keywords and descriptions
HTML:
<meta name="keywords" content="tag1, tag2, tag3">
<meta name="description" content="My page description">
<meta name="title" content="My page title" >
<meta property="og:title" content="Localization Platform for Translating Digital Content | Transifex">
<meta property="og:description" content="Integrate with Transifex to manage the creation of multilingual websites and app content. Order translations, see translation progress, and tools like TM.">
Segments:
"tag1, tag2, tag3"
"My page description"
"My page title"
"Localization Platform for Translating Digital Content | Transifex"
"Integrate with Transifex to manage the creation of multilingual websites and app content. Order translations, see translation progress, and tools like TM."
Input elements of "image" type
HTML:
<input type="image" alt="Submit">
Segments:
"Submit"
SVG elements
SVG tags may contain some nested TEXT tags which are parsed and their strings extracted, but there is no MARKING in the UI for these elements. However, when you mouse over these elements, the options for the strings are shown (ignore string, follow link, etc.).
Elements that are ignored: script, style, link, iframe, noscript, canvas, audio, video, code
.
Social widgets such as Facebook and Twitter that have tags with class names facebook_container
and twitter_container
are also ignored.
How to handle non-translatable content
You can manually define a block or node as non-translatable by adding a notranslate
class.
For example:
<div class="notranslate">This content will not be translated</div>
Marking attributes for translation
Apart from the attributes that are automatically detected for translations, you can define custom attributes for translation using the tx-attrs="attr1, attr2,..."
attribute.
Before:
HTML:
<span title="My title" data-content="My data content">
Segments: Nothing detected
After:
HTML:
<span title="My title" data-content="My data"
tx-attrs="title, data-content">
Segments:
"My title"
"My data"
How to tag strings in the source language
You can automatically tag source strings by using the tx-tags="tag1, tag2,..."
attribute.
These tags propagate to child elements as well.
For example:
<div tx-tags="marketing">...</div>
How to handle inline block variables
To define variables or placeholders within a block that shouldn't be translated, use class="notranslate"
in the variable nodes or encapsulate them inside var
tags.
For example:
HTML:
Hi, you are visitor <span class="notranslate">142</span>
Hi, you are visitor <var>341</var>
Segments:
"Hi, you are visitor {{0}}"
How to handle URLs as translatable content
When images <img>
or links <a>
appear within a segment, their URLs are handled by default as non-translatable content (i.e variables).
Translating images
To translate an image you should treat its URL as translatable text. To do so, use the special directive tx-content="translate_urls"
to enable this functionality for a node and its children.
Before:
HTML:
<div>
<img src="/uploads/smiley.jpg" alt="Smiley face" width="42" height="42">
</div>
Segments:
"<img src="{{0}}" alt="Smiley face" width="42" height="42">"
After:
HTML:
<div tx-content="translate_urls">
<img src="/uploads/smiley.jpg" alt="Smiley face" width="42" height="42">
</div>
Segments:
"<img src="/uploads/smiley.jpg" alt="Smiley face" width="42" height="42">"
Translating links
To translate a link you should treat each URL as translatable text. To do so, use the special directive tx-content="translate_urls"
to enable this functionality for a node and its children.
Before:
HTML:
<div>
Click to go to the <a href="/features">features</a> page
</div>
Segments:
"Click to go to the <a href="{{0}}">features</a> page"
After:
HTML:
<div tx-content="translate_urls">
Click to go to the <a href="/features">features</a> page
</div>
Segments:
"Click to go to the <a href="/features">features</a> page"
Tip: To treat ALL URLs as translatable content within a page, add the tx-content="translate_urls"
to the opening BODY
tag.
How to define custom variables
If you want to use your own custom patterns and you are looking for a way to ignore such text handling this as a variable, then you can add custom rules on how variables are handled within a string segment.
For example:
const txdom = new TxNativeDOM({
variablesParser: (text, fn) => {
text = text.replace(/s-href="([^"]*)"/g, (a, b) => {
return a.replace(b, fn(b));
});
return text;
},
});
How to fine tune translatable content
For even finer control over how strings are detected, use the tx-content
HTML attribute, which can contain the following values:
exclude
to mark a node and its children to be excluded from string detectioninclude
to mark a node and its children within a exclude block to be included in string detectionblock
to mark a node and its children to be detected as a single stringnotranslate_urls
to mark a node and its children to handle URLs as variables (default)translate_urls
to mark a node and its children that URLs should be translated
Include/exclude example.
Before:
HTML:
<div>
<p>First text</p>
<p>Second text</p>
<p>Third text</p>
</div>
Segments:
"First text"
"Second text"
"Third text"
After:
HTML:
<div tx-content="exclude">
<p>First text</p>
<p tx-content="include">Second text</p>
<p>Third text</p>
</div>
Segments:
"Second text"
Block example.
Before:
HTML:
<div>
<h1>A header</h1>
<p>A paragraph</p>
</div>
Segments:
"A header"
"A paragraph"
After:
HTML:
<div tx-content="block">
<h1>A header</h1>
<p>A paragraph</p>
</div>
Segments:
"<h1>A header</h1><p>A paragraph</p>"
Note: Strings that match the following regular expression are ignored:
^( |\s|\d|[-/:-?~@#!"^_`.,[]])*$
License
Licensed under Apache License 2.0, see LICENSE file.