Analyse Media File
This library is written to help streamline getting information about, and subtitles for, backups of DVDs and BluRays.
Using only string methods, in order to be compatible with both browser and desktop applications, to build search query data for various media API backends.
The library is written in TypeScript and includes types easily converted to TMDB, OMDB and IMDB search queries as well as media hash coding compatible with OpenSubtitles API.
Analysing Functions
makeHash
returns a promise of a 16 length hex string that can be used in various media APIs to search for content about a media file. Most commonly it is used to find subtitles via https://opensubtitles.org when your DVD backup didn't have subtitles included in your preferred language.
analyseFilePath
returns a structured output of included type AnalysedMedia
which is a Union
of AnalysedMovie
, AnalysedTVShow
or a string
.
The string
will include just the guessed at title if it was unable to identify the name and path formatting from your backup program.
AnalysedMovie
is structured as follows:
export interface AnalysedMovie {
type: 'movie';
year?: number;
name: string;
}
name will be the guessed at name, based on common patterns from backup programs, and year will be the guessed at release year.
AnalysedTVShow
is structured as follows:
export interface AnalysedTVShow {
type: 'tv';
name: string;
season?: number;
episodes?: number[];
year?: number;
}
name will be the guessed at name, based on common patterns from backup programs, season and episodes will be the guessed at season and episodes, multiple episodes encoded in the same file will come as multiple entries in episodes array, year will be the year of the first air date.
Usage
This library will work in browser or in node and depends on boilerplate code to read file content and file path.
Browser
Using FileWithPath (eg. from React-dropzone) getting search data is done as:
const [analysed, setAnalysed] = useState<AnalysedMedia[]>([]);
const onDrop = useCallback((files) => {
setAnalysed(files.map((file) => analyseFilePath(file.path)));
}, []);
const {...} = useDropzone({onDrop, accept: 'video/*'});
...
Getting the Media Hash takes a promise based FileReader wrapper, like:
const HASH_CHUNK_SIZE = 65536; //64 * 1024 - MediaHash defined
const [hashes, setHashes] = useState<string[]>([]);
//Simple promise wrapper for FileReader
const readBlock = useCallback((file: File, block: number): Promise<string> => {
return new Promise<string>((resolve, reject): void => {
const reader = new FileReader();
reader.onload = (event) => {
if (event.target !== null) {
resolve(event.target.result as string);
} else {
reject(event);
}
};
reader.onerror = (error) => {
reject(error);
}
if (block < 0) {
reader.readAsBinaryString(file.slice(block));
} else {
reader.readAsBinaryString(file.slice(0, block));
}
});
});
const onDrop = useCallback((files) => {
//makeHash uses file size and the first and last 64K chunk of the file.
Promise.all(files.map((file) => makeHash(file.size, readBlock(file, HASH_CHUNK_SIZE), readBlock(file, -HASH_CHUNK_SIZE)))
.then(setHashes)
}, []);
const {...} = useDropzone({onDrop, accept: 'video/*'});
...
Aside from these functions there are a couple of helper functions included:
isAnalysedMovie
and isAnalysedTVShow
identity wrappers, and isExtras
and isSample
will tell if the file is likely a sample or an extra, finally isSameRelease
will tell if two instances of AnalysedMedia are likely the same movie or TVShow (excluding season and episode from comparison).
These methods can help cut down on the number of requests to APIs by grouping requests for TVShows and only getting individual episodes as needed, and not requesting subtitles for sample files, etc.
API Functions
API functions to search TMDB, OMDB and OpenSubtitles. The functions are seperated into mapper functions that map AnalysedMedia
into queries for these APIs, and search, get and find methods that return Axios compatible request config objects.
Added mapper functions to map Tmdb and Omdb results into a subset of data called Media
as a union of Movie
and TVShow
types:
export interface MediaInfo {
plot?: string;
images: Record<string, string>;
}
export interface Movie {
type: 'movie';
imdbId?: string;
tmdbId?: number;
title: string;
release?: string;
mediaInfo: MediaInfo;
}
export interface TVShow {
type: 'tv';
imdbId?: string;
tmdbId?: number;
name: string;
firstAirDate?: string;
mediaInfo: MediaInfo;
}
To use these AnalysedMedia objects can be mapped to search queries and used with the search functions to create an Axios compatible request object, like this:
Axios.request(searchTmdb(mapTmdbQuery(media), this.apiKey))
.then((response) => response.data)
This will return a TmdbSearchResponse object of the following kind:
export interface TmdbSearchResponse {
page: number;
total_results: number;
total_pages: number;
results: Array<TmdbMovieResult|TmdbTVShowResult>;
}
The release and firstAirDate fields are formatted as yyyy-mm-dd
.
There is a specialised mergeMedia
function that takes two of these and return the combined result. This can be used to fetch both from Tmdb and Omdb and merge into a single object.
Future development
Further methods would include looking for Meta-data inside files as some backup programs put useful information there, including looking for already existing subtitles before inquiring https://opensubtitles.org