Searx metasearch engines

update as of 2020

somehow just having the searx engine running did not allow me to visit Duckduckgo.org for the second half of a day while everything else still was fine. They are probably blocking my IP-Address if it using to many “robot” requests as Searx does.

Another shortcome of Searx is, that you can not nativly set your instance as a standard search engine in your browser (as you can not customize the domain in search plugins). This made me often not use my service at all. Therefore I didn’t look into the duckduckgo blocking issue any further but I am thankful for any hints.

Because of the above stated issues I did shut my searx-instance down a month ago.

original post

Today I set up a new webservice, reachable under the subdomain search.datensch.eu.

It is Searx, which is a metasearch engines and proxies a bunch of other search engines while taking care of your privacy.

As I am hosting most things in docker I also used looked for a manual to install Searx from the referenced Docker documentation which already has its own docker-compose.yml.

There is already a Nginx server running on my server so we want to create a new configuration in sites-available to reverse proxy the port from the docker container to our domain.

server {
    listen 443 ssl http2;
    server_name domain.eu;

    location / {
        proxy_pass http://localhost:8002/; # Note Trailing slash
    }
}

So lets dive into it!

Inspecting the git-repository

Of course one could just copy the project and run it, but this would be very naive as it could run stuff we don’t want. So we will first take a deeper look into the repository to see what we need from it. So there are a few scripts and a systemd-template file, but those basically just manage the usage of docker-compose up and docker-compose down so in my case it is not really of interest.

The Readme suggests installing to /usr/local but you can put it wherever you store your docker-compose files. You can delete the scripts if you want to. We just need

  • docker-compose.yml
  • rules.json
  • Caddyfile
  • .env

The docker-compose file consists of a few services working together:

  1. Caddy - a reverse proxy, quite similar to nginx
  • it is preconfigured for our usecase with searx. You could also rewrite the included Caddyfile to include the settings (mostly configuring headers) in your own search.domain.eu.conf nginx configuration file and remove the Caddy container but I just left it as it is for now.
  • Caddy accepts requests and points them to filtron (or to Searx-checker if the /status route gets called)
  1. Filtron - filtering rules for bot and abuse protection
  • here we have the rules.json which filters client connections by looking into the Header
  • here we will delete the block with name block Connection:close as our requests which get proxied through another nginx server will get blocked otherwise
  • if no rules are blocking a request the request is sent to Searx
  1. Searx - the basic docker image without anything else
  • we could also run this image bare but as it seems that public Searx instances get a lot of traffic I thought it would be a better idea to use the given protection modules too
  • searx relies on Morty
  1. Morty - content proxy
  • is used to rewrite malicious HTML tags and attributes. It also replaces external resource references to prevent third party information leaks
  • so here it is about privacy for search queries
  1. Searx-checker - checks availability and configuration of search engines
  • as Searx is a meta search engine you can configure the search engines you want to have available
  • Searx-checker periodically checks the availability from your Searx-instance
  • results are at search.domain.eu/status

As I said you could also just run the Searx image but I didn’t feel brave enough as I am hosting the instance on the WWW and care a lot about protection. I will see if those blocker are actually needed.

configuring the .env

You also have to configure the given .env file for docker-compose:

As we want to host Searx on the domain search.domain.eu we could think that the SEARX_HOSTNAME should be changed. This would only allow requests coming from your domain to access the reverse proxy. As we will put another nginx reverse proxy (proxying to localhost) before Caddy, we should just leave it at localhost and set the SEARX_PROTOCOL to http as our reverse proxy takes care of https traffic.

You should of course use the stated commands to create a new random MORTY_KEY and a random password for filtron using the given commands.

configuring the docker-compose.yml

In the given docker-compose.yml the container caddy is running on network_mode: host which prevents it from starting as the port 80 is already in use (suprise). Changing the mapped port does no change to this, as the container is running in host mode. As we do not want this behavior (and there is no need to have it, honestly) we remove the line. But now the 127.0.0.1 occurances are pointing inside the caddy container and can not be resolved correctly.

To resolve this we replace the occurances of 127.0.0.1 in the Caddyfile with the hostname of the used docker (this is generelly more dockerish and should be preferred).

As docker resolves ips to the hostname anyway we can remove the networking ips and replace it with the coresponding hostname of the docker-container.

get it running

After those steps all you have to do is to run docker-compose up and enjoy your instance of Searx

conclusion

In my opinion there is a lot of improvement to be made with the given docker-compose file from GitHub.

I for myself consider to remove the Caddy and searx-checker as I don’t need them (and probably running Searx only would be working too) but all in all it is quite a good setup which was easily configurable.

I included my docker-compose file for reference below

my docker-compose.yml
version: 3.9
services:
  caddy:
    container_name: caddy
    image: abiosoft/caddy:1.0.3-no-stats
    ports:
      - 8002:80
      - 8001:443
    command: -log stdout -host ${SEARX_HOSTNAME} -conf /etc/Caddyfile
    volumes:
      - ./Caddyfile:/etc/Caddyfile:rw
      - ./caddy:/root/.caddy:rw
      - ./srv:/srv:rw
      - ./searx-checker:/srv/searx-checker:rw
    environment:
      - SEARX_HOSTNAME=${SEARX_HOSTNAME}
      - SEARX_PROTOCOL=${SEARX_PROTOCOL:-}
      - SEARX_TLS=${SEARX_TLS:-}
      - FILTRON_USER=${FILTRON_USER}
      - FILTRON_PASSWORD=${FILTRON_PASSWORD}
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE
      - DAC_OVERRIDE
  filtron:
    container_name: filtron
    image: dalf/filtron
    hostname: filtron
    restart: always
    ports:
      - 4040:4040
      - 4041:4041
    command: -listen filtron:4040 -api filtron:4041 -target searx:8080
    volumes:
      - ./rules.json:/etc/filtron/rules.json:rw
    read_only: true
    cap_drop:
      - ALL
  searx:
    container_name: searx
    image: searx/searx:latest
    hostname: searx
    restart: always
    command: ${SEARX_COMMAND:-}
    volumes:
      - ./searx:/etc/searx:rw
    environment:
      - BIND_ADDRESS=searx:8080
      - BASE_URL=https://${SEARX_HOSTNAME}/
      - MORTY_URL=https://${SEARX_HOSTNAME}/morty/
      - MORTY_KEY=${MORTY_KEY}
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - SETGID
      - SETUID
      - DAC_OVERRIDE
  morty:
    container_name: morty
    image: dalf/morty
    hostname: morty
    restart: always
    ports:
      - 3022:3000
    command: -listen morty:3000 -timeout 6 -ipv6
    environment:
      - MORTY_KEY=${MORTY_KEY}
    logging:
      driver: none
    read_only: true
    cap_drop:
      - ALL  
  searx-checker:
    container_name: searx-checker
    image: searx/searx-checker
    hostname: searx-checker
    restart: always
    command: -cron -o html/data/status.json http://searx:8080
    volumes:
      - ./searx-checker:/usr/local/searx-checker/html/data:rw