CORS Proxy
This is a simple CORS proxy that was originally developed for anychan web application.
Based on cors-anywhere
with some changes.
Use
Create a new folder. Go into it. Initialize a new Node.js project in it:
npm init
Call your project any name. Answer the questions it asks.
After it finishes setting up the project, install cors-proxy-node
dependency:
npm install cors-proxy-node --save
Create a new file index.js
:
import corsProxy from 'cors-proxy-node'
corsProxy({
host: '0.0.0.0',
port: 8080
})
In package.json
, add a new script
called start
:
"scripts": {
"start": "node index.js"
}
Start the proxy using the command:
npm start
To proxy a URL through the CORS proxy, one could send an HTTP request to:
/<url>
/?url=<encodeURIComponent(url)>
For example, if host
is set to "0.0.0.0"
and port
is set to 8080
, then to proxy https://google.com
URL through the CORS proxy, one could send an HTTP request to:
http://my-cors-proxy.com:8080/https://google.com
http://my-cors-proxy.com:8080/?url=https%3A%2F%2Fgoogle.com
Configuration
Configuration is very simple and should be specified in config.json
file.
-
host: string
— The hostname to listen on. The simplest value that always works is "0.0.0.0"
which means "listen on all possible host names for this host". This parameter is ignored when HOST
environment variable is set.
-
port: number
— The port to listen on. Example: 8080
. This parameter is ignored when PORT
environment variable is set.
-
fromOriginWhitelist?: string[]
— An explicit "whitelist" of allowed HTTP origins to accept proxy requests from. If this configuration parameter is specified then only those HTTP origins will be allowed to send HTTP requests to this proxy server. Otherwise, all incoming HTTP requests are allowed, regardless of the HTTP origin they came from.
-
toOriginWhitelist?: string[]
— An explicit "whitelist" of allowed HTTP origins to accept proxy requests towards. If this configuration parameter is specified then any incoming HTTP requests towards those destination origins are allowed, regardless of the fromOriginWhitelist
setting.
-
cookies?: boolean
— Set to true
to enable cookies. Cookies are disabled by default. Enabling cookies requires setting both fromOriginWhitelist
and shareCookiesBetweenOriginsInFromOriginWhitelist
parameters. Enabling cookies is required when calling fetch()
with credentials: "include"
parameter.
-
shareCookiesBetweenOriginsInFromOriginWhitelist?: boolean
— An explicit "opt-in" flag that is required to be set to true
when enabling cookies. The only purpose of this flag is to make it explicit that, when enabled, cookies are shared between all originas in fromOriginWhitelist
because not everyone realizes that. I myself didn't realize it.
x-cookie
Web browsers don't allow client-side javascript code to set the value of the cookie
header of an HTTP request. To work around that, there's an x-cookie
header: if specified, the contents of x-cookie
request header will be appended to the cookie
request header using "; "
as a separator. This is a way to add any additional cookies to a proxied HTTP request.
x-set-cookies
Web browsers don't expose set-cookie
headers of an HTTP response to client-side javascript code. To work around that limitation and see what cookies exactly have been set by the server, one could pass an HTTP request header called x-set-cookies
with value true
. In that case, the HTTP response is gonna contain a header called x-set-cookies
whose value is gonna be a stringified JSON array of all set-cookies
headers' values, if there were any in the server's response.
Trivia: There can be several set-cookie
headers in a given HTTP response: one for each cookie. That's how it's defined in the HTTP specification.
x-redirect-status
When specified, replaces status 30x
in HTTP response with the value of this header. This allows to bypass the weird behavior of the fetch()
function: otherwise, when it receives HTTP response status 302
in CORS mode, it doesn't allow the application to look into the response details and instead sets response.status
to 0
and response.headers
to empty headers. Issue. Replacing response status 302
with something else like 200
allows a developer to bypass that weird behavior and examine the status and headers of the response.
x-follow-redirect
Redirects are automatically followed unless the request header x-follow-redirect
is explicitly set to false
.
When automatically "following" a chain of redirects, it must concatenate all set-cookie
response headers in the chain and output the result in set-cookie
header of the final response.
x-set-cookies
See the description of the x-set-cookies
request header.
x-redirect-status
When passing x-redirect-status
header in request to override a redirect status, in case of a redirect, it will add an x-redirect-status
header in response with the value of the original response status (before the override).
x-redirect-n
For debugging purposes, each followed redirect results in the addition of an x-redirect-n
response header, where n
starts at 1
. The value of each such header is comprised of the redirect status code and the redirect URL separated by a whitespace.
After 5 redirects, redirects are not followed any more. The redirect response is sent back to the browser, which can choose to follow the redirect (handled automatically by the browser).
x-request-url
The requested URL.
x-final-url
The final URL, after following all redirects.
Hosting
An example of setting up a free CORS proxy at Vercel
Original article
- Create a repo on GitLab or GitHub with the contents of the proxy folder.
- Create
vercel.json
file in the repo. It sets up Vercel hosting for the repo:
{
"version": 2,
"name": "nodejs-mysql",
"builds": [
{ "src": "index.js", "use": "@vercel/node" }
],
"routes": [
{ "src": "/(.*)", "dest": "/index.js" }
]
}
- Push the changes to the repo.
- Login to Vercel using your GitLab or GitHub account.
- Click "Add New" → "Project".
- Choose "GitLab" or "GitHub".
- Find the repo in the list and click the "Import" button next it.
- After the project has been deployed, it will show the website URL for it. Use it as a proxy server URL. Example:
https://my-cors-proxy.vercel.app?url={urlEncoded}
.
Restrictions
To prevent the use of the proxy for casual browsing, the proxy requires one of the following request headers to be present:
Stats
There's a basic "stats" page available at /stats
URL. It displays a list of the most recent requests to the proxy: date, time, user subnet's IP address hash (in a form of a single unicode character) and the proxied URL.
When running in a containerized environment like Vercel, the proxy instance might be stopped when it doesn't receive incoming HTTP requests and then started again when an incoming HTTP request arrives. Any stats will be naturally cleared during such restart.
<iframe/>
When proxying an <iframe/>
contents, use the url
query parameter apprach and also set iframe
query parameter to a non-empty value:
http://my-cors-proxy.com:8080/?url=https%3A%2F%2Fgoogle.com&iframe=✓
This also requires adding the Proxy domain itself to fromOriginWhitelist
configuration parameter. The reason is that when loading an <iframe/>
, web browser will set Origin
HTTP request header value to be the "origin" part of the URL specified in the src
attribute of the <iframe/>
which is a proxied URL, so the domain of it will be the Proxy domain.
An additional optional URL query parameter that can be specified in this case is transforms
: an optional list of "transformations" that would be applied to the received HTML response content. When provided, the value should be a result of calling JSON.stringify()
on a list of transform
objects having shape:
{
target: "content",
regExp?: boolean,
searchFor: string,
replaceWith: string
}
Example:
[
{
target: "content",
searchFor: "'/cdn-cgi/",
replaceWith: "'https://website.com/cdn-cgi/",
}
]
In summary:
- It replaces
Content-Security-Policy
HTTP response header with frame-ancestors *;
to allow the page to be embedded on any 3rd-party website. - Applies any
transforms
, when provided, to the received HTML response content.
- For example, such transforms should convert any relative URLs found in the HTTP response to absolute ones. Otherwise, when the
<iframe/>
d page sends additional HTTP requests for "resource" files at "relative" URLs (like /scripts/some-script.js
), those "resources" won't be found because those "relative" URLs would be resolved against the domain of the proxy server itself rather than the domain of the website being proxied.