You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 7-8.RSVP
Socket
Socket
Sign inDemoInstall

github.com/Financial-Times/concept-search-api

Package Overview
Dependencies
Alerts
File Explorer
Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/Financial-Times/concept-search-api


Version published

Readme

Source

Concept Search API

CircleCI Go Report Card Coverage Status

API for searching concepts in an Amazon Elasticsearch cluster.

:warning: The AWS SDK for Go does not currently include support for ES data plane api, but the Signer is exposed since v1.2.0.

The taken approach to access AES (Amazon Elasticsearch Service):

How to run

To build the project go1.17 or newer is required to be present beforehand.

git clone https://github.com/Financial-Times/concept-search-api.git
cd concept-search-api 
go build
./concept-search-api --aws-access-key="{access key}" --aws-secret-access-key="{secret key}"

It is also possible to provide the Elasticsearch endpoint, the port you expect the app to run on, the Elasticsearch index on which the search is performed and the maximum number of returned results.

Other parameters:                               
--port                           Port to listen on (env $PORT) (default "8080")
--aws-access-key                 AWS ACCESS KEY (env $AWS_ACCESS_KEY_ID)
--aws-secret-access-key          AWS SECRET ACCESS KEY (env $AWS_SECRET_ACCESS_KEY)
--elasticsearch-endpoint         AES endpoint (env $ELASTICSEARCH_ENDPOINT) (default "http://localhost:9200")
--auth                           Authentication method for ES cluster (aws or none) (env $AUTH) (default "none")
--elasticsearch-default-index    Elasticsearch default index (env $ELASTICSEARCH_DEFAULT_INDEX) (default "concepts")
--elasticsearch-extended-index   Elasticsearch extended index (env $ELASTICSEARCH_EXTENDED_SEARCH_INDEX) (default "all-concepts")
--api-yml                        Location of the API Swagger YML file. (env $API_YML) (default "./api.yml")
--search-result-limit            The maximum number of search results returned (excluding the search with the `ids` parameter or the searches used for autocomplete) (env $RESULT_LIMIT) (default 50)
--max-ids-limit                  The maximum number of uuids allowed as search input for the `ids` parameter (env $MAX_IDS_LIMIT) (default 1000)
--autocomplete-result-limit      The maximum number of autocomplete results returned (env $AUTOCOMPLETE_LIMIT) (default 10)
--elasticsearch-trace            Whether to log ElasticSearch HTTP requests and responses (env $ELASTICSEARCH_TRACE) (defaults false)

How to test

  • Unit tests only: go test -mod=readonly -race ./...
  • Unit and integration tests:
    docker-compose -f docker-compose-tests.yml up -d --build && \
    docker logs -f test-runner && \
    docker-compose -f docker-compose-tests.yml down -v
    

To run the full test suite of integration tests, you must have a running instance of elasticsearch. By default the application will look for the elasticsearch instance at http://localhost:9200. Otherwise you could specify a URL yourself as given by the example below:

export ELASTICSEARCH_TEST_URL=http://localhost:9200

Available DATA endpoints:

POST /concept/search

The endpoint is used for searching concepts. The payload is a JSON with a field called term. The value of this field represents the search criteria. For example searching for FOO looks like this:

curl -XPOST {concept-search-api-url}/concept/search -d '{"term":"FOO"}'

The matching concepts are returned ordered by the strength of their match. However, the actual score is not included.

To include the score you need to add the query parameter include_score with the value true. If the parameter has a value other than true the score will not be included. The score is a field that appears in each concept alongside the data that represents the actual concept. For example searching for FOO with scoring looks like this:

curl -XPOST {concept-search-api-url}/concept/search?include_score=true -d '{"term":"FOO"}'

By default, the endpoint only retrieves results with TME or Smartlogic authority. To extend the search domain you need to add the query parameter searchAllAuthorities with the value true. This will return TME, Smartlogic, Factset or any other and no authority results. NB: The filtering is implemented by applying a filter on the concept alias in the elasticsearch index.

curl -XPOST {concept-search-api-url}/concept/search?searchAllAuthorities=true -d '{"term":"FOO"}'

By default, the endpoint returns only non-deprecated concepts. In order to get the deprecated concepts too, you should provide query parameter include_deprecated with the value true.

curl -XPOST {concept-search-api-url}/concept/search?include_deprecated=true -d '{"term":"FOO"}'

Exact matches are preferred over partial ones and an example of search results with scoring and include deprecated would look like this:

[
  {
    "id": "http://api.ft.com/things/d79f6383-9271-3a03-aacd-5ce8e57d6f5e",
    "uuid": "d79f6383-9271-3a03-aacd-5ce8e57d6f5e",
    "apiUrl": "http://api.ft.com/organisations/d79f6383-9271-3a03-aacd-5ce8e57d6f5e",
    "prefLabel": "FOO LLC",
    "types": [
      "http://www.ft.com/ontology/core/Thing",
      "http://www.ft.com/ontology/concept/Concept",
      "http://www.ft.com/ontology/organisation/Organisation"
    ],
    "directType": "http://www.ft.com/ontology/organisation/Organisation",
    "aliases": [
      "FOO LLC",
      "FOO"
    ],
    "score": 10.117536,
    "isDeprecated": true
  },
  {
    "id": "http://api.ft.com/things/87c69c2c-ad53-3888-9958-835098db4dae",
    "uuid": "87c69c2c-ad53-3888-9958-835098db4dae",
    "apiUrl": "http://api.ft.com/organisations/87c69c2c-ad53-3888-9958-835098db4dae",
    "prefLabel": "FOO International",
    "types": [
      "http://www.ft.com/ontology/core/Thing",
      "http://www.ft.com/ontology/concept/Concept",
      "http://www.ft.com/ontology/organisation/Organisation"
    ],
    "directType": "http://www.ft.com/ontology/organisation/Organisation",
    "aliases": [
      "FOO International",
      "FOO INTERNATIONAL"
    ],
    "score": 2.8585405
  }
]

If no results are found a 404 - Not Found response will be returned. In case the payload of the search request does not follow the indicated structure a 400 - Bad request will be returned. If the search fails for various reasons independent from the caller a 500 - Internal Server Error is returned.

GET /concepts

This endpoint is used for typeahead style queries for concepts. The request has several query parameters, of which only the type is required - here is a basic Genres example:

curl {concept-search-api-url}/concepts?type=http://www.ft.com/ontology/Genre

Optional query parameters:

  • To activate the search mode, you can send the mode parameter with the values described in the table below, and q parameter with the value of the search query
ModeDescription
mode=searchOptimized for time-sensitive types such as topics, people
curl {concept-search-api-url}/concepts?type=http://www.ft.com/ontology/organisation/Organisation&mode=search&q=FOO
mode=textOptimized for types that are not time-sensitive. Uses full-text ES queries. Note: Currently requests are possible only if either organization or public company type is supplied to the request
curl {concept-search-api-url}/concepts?type=http://www.ft.com/ontology/organisation/Organisation&type=http://www.ft.com/ontology/company/PublicCompany&mode=text&q=FOO
  • boost parameter can be specified when activating the search mode, but it is currently supported only for authors

    E.g. The following request will return results with "isFTAuthor": true

    curl {concept-search-api-url}/concepts?type=http://www.ft.com/ontology/person/Person&mode=search&q=FOO&boost=authors
    
  • searchAllAuthorities parameter can be used to extend the search domain. This will return TME, Smartlogic, Factset or any other and no authority results

    curl {concept-search-api-url}/concepts?type=http://www.ft.com/ontology/Genre&searchAllAuthorities=true
    
  • include_deprecated paramenter can be used to include deprecated concepts in the search result

    curl {concept-search-api-url}/concepts?type=http://www.ft.com/ontology/Genre&include_deprecated=true
    

Please see the Swagger YML for more details.

Available HEALTH endpoints:

GET /__health

Provides the standard FT output indicating the connectivity and the cluster's health.

GET /__health-details

Provides a detailed health status of the ES cluster. It matches the response from elasticsearch-endpoint/_cluster/health It returns 503 is the service is currently unavailable, and cannot connect to elasticsearch.

GET /__gtg

Return 200 if the application is healthy, 503 Service Unavailable if the app is unhealthy.

FAQs

Package last updated on 09 May 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc