Searx metasearch engines
update as of 2020
somehow just having the searx engine running did not allow me to visit Duckduckgo.org for the second half of a day while everything else still was fine. They are probably blocking my IP-Address if it using to many “robot” requests as Searx does.
Another shortcome of Searx is, that you can not nativly set your instance as a standard search engine in your browser (as you can not customize the domain in search plugins). This made me often not use my service at all. Therefore I didn’t look into the duckduckgo blocking issue any further but I am thankful for any hints.
Because of the above stated issues I did shut my searx-instance down a month ago.
original post
Today I set up a new webservice, reachable under the subdomain search.datensch.eu.
It is Searx, which is a metasearch engines and proxies a bunch of other search engines while taking care of your privacy.
As I am hosting most things in docker I also used looked for a manual to install Searx from the referenced Docker documentation which already has its own docker-compose.yml
.
There is already a Nginx server running on my server so we want to create a new configuration in sites-available
to reverse proxy the port from the docker container to our domain.
server {
listen 443 ssl http2;
server_name domain.eu;
location / {
proxy_pass http://localhost:8002/; # Note Trailing slash
}
}
So lets dive into it!
Inspecting the git-repository
Of course one could just copy the project and run it, but this would be very naive as it could run stuff we don’t want.
So we will first take a deeper look into the repository to see what we need from it.
So there are a few scripts and a systemd-template file, but those basically just manage the usage of docker-compose up
and docker-compose down
so in my case it is not really of interest.
The Readme suggests installing to /usr/local
but you can put it wherever you store your docker-compose files.
You can delete the scripts if you want to. We just need
- docker-compose.yml
- rules.json
- Caddyfile
- .env
The docker-compose file consists of a few services working together:
- Caddy - a reverse proxy, quite similar to nginx
- it is preconfigured for our usecase with searx. You could also rewrite the included Caddyfile to include the settings (mostly configuring headers) in your own
search.domain.eu.conf
nginx configuration file and remove the Caddy container but I just left it as it is for now. - Caddy accepts requests and points them to filtron (or to Searx-checker if the
/status
route gets called)
- Filtron - filtering rules for bot and abuse protection
- here we have the
rules.json
which filters client connections by looking into the Header - here we will delete the block with name
block Connection:close
as our requests which get proxied through another nginx server will get blocked otherwise - if no rules are blocking a request the request is sent to Searx
- Searx - the basic docker image without anything else
- we could also run this image bare but as it seems that public Searx instances get a lot of traffic I thought it would be a better idea to use the given protection modules too
- searx relies on Morty
- Morty - content proxy
- is used to rewrite malicious HTML tags and attributes. It also replaces external resource references to prevent third party information leaks
- so here it is about privacy for search queries
- Searx-checker - checks availability and configuration of search engines
- as Searx is a meta search engine you can configure the search engines you want to have available
- Searx-checker periodically checks the availability from your Searx-instance
- results are at
search.domain.eu/status
As I said you could also just run the Searx image but I didn’t feel brave enough as I am hosting the instance on the WWW and care a lot about protection. I will see if those blocker are actually needed.
configuring the .env
You also have to configure the given .env
file for docker-compose:
As we want to host Searx on the domain search.domain.eu we could think that the SEARX_HOSTNAME
should be changed. This would only allow requests coming from your domain to access the reverse proxy. As we will put another nginx reverse proxy (proxying to localhost) before Caddy, we should just leave it at localhost
and set the SEARX_PROTOCOL
to http as our reverse proxy takes care of https traffic.
You should of course use the stated commands to create a new random MORTY_KEY
and a random password for filtron using the given commands.
configuring the docker-compose.yml
In the given docker-compose.yml
the container caddy is running on network_mode: host
which prevents it from starting as the port 80 is already in use (suprise). Changing the mapped port does no change to this, as the container is running in host mode.
As we do not want this behavior (and there is no need to have it, honestly) we remove the line.
But now the 127.0.0.1
occurances are pointing inside the caddy container and can not be resolved correctly.
To resolve this we replace the occurances of 127.0.0.1
in the Caddyfile
with the hostname of the used docker (this is generelly more dockerish and should be preferred).
As docker resolves ips to the hostname anyway we can remove the networking ips and replace it with the coresponding hostname of the docker-container.
get it running
After those steps all you have to do is to run docker-compose up
and enjoy your instance of Searx
conclusion
In my opinion there is a lot of improvement to be made with the given docker-compose file from GitHub.
I for myself consider to remove the Caddy and searx-checker as I don’t need them (and probably running Searx only would be working too) but all in all it is quite a good setup which was easily configurable.
I included my docker-compose file for reference below
my docker-compose.yml
version: 3.9
services:
caddy:
container_name: caddy
image: abiosoft/caddy:1.0.3-no-stats
ports:
- 8002:80
- 8001:443
command: -log stdout -host ${SEARX_HOSTNAME} -conf /etc/Caddyfile
volumes:
- ./Caddyfile:/etc/Caddyfile:rw
- ./caddy:/root/.caddy:rw
- ./srv:/srv:rw
- ./searx-checker:/srv/searx-checker:rw
environment:
- SEARX_HOSTNAME=${SEARX_HOSTNAME}
- SEARX_PROTOCOL=${SEARX_PROTOCOL:-}
- SEARX_TLS=${SEARX_TLS:-}
- FILTRON_USER=${FILTRON_USER}
- FILTRON_PASSWORD=${FILTRON_PASSWORD}
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
- DAC_OVERRIDE
filtron:
container_name: filtron
image: dalf/filtron
hostname: filtron
restart: always
ports:
- 4040:4040
- 4041:4041
command: -listen filtron:4040 -api filtron:4041 -target searx:8080
volumes:
- ./rules.json:/etc/filtron/rules.json:rw
read_only: true
cap_drop:
- ALL
searx:
container_name: searx
image: searx/searx:latest
hostname: searx
restart: always
command: ${SEARX_COMMAND:-}
volumes:
- ./searx:/etc/searx:rw
environment:
- BIND_ADDRESS=searx:8080
- BASE_URL=https://${SEARX_HOSTNAME}/
- MORTY_URL=https://${SEARX_HOSTNAME}/morty/
- MORTY_KEY=${MORTY_KEY}
cap_drop:
- ALL
cap_add:
- CHOWN
- SETGID
- SETUID
- DAC_OVERRIDE
morty:
container_name: morty
image: dalf/morty
hostname: morty
restart: always
ports:
- 3022:3000
command: -listen morty:3000 -timeout 6 -ipv6
environment:
- MORTY_KEY=${MORTY_KEY}
logging:
driver: none
read_only: true
cap_drop:
- ALL
searx-checker:
container_name: searx-checker
image: searx/searx-checker
hostname: searx-checker
restart: always
command: -cron -o html/data/status.json http://searx:8080
volumes:
- ./searx-checker:/usr/local/searx-checker/html/data:rw