Monitor and Analyze Nginx Ingress Controller Logs on Kubernetes using ElasticSearch and Kibana

Overview
Configure Nginx Ingress Controller Custom Log Format
Configure Kibana for Nginx Custom Log Format
Deploy Fluent Bit on Kubernetes
- Fluent Bit Configuration
- Deploying Fluent Bit to Kubernetes
Performing Quick Log Analytics
Summary

Overview

In the previous post, I covers the high level concept of FluentD, ElasticSearch and Kibana, and how to quickly deploy ElasticSearch and Kibana using Docker Compose on Raspberry Pi 4. In this post, we are going to look at how to deploy and configure Fluent Bit (a sub-project of FluentD) to capture the Nginx Ingress Controller logs on Kubernetes and stream the formatted logs to ElasticSearch. We can then perform the logs analytics using Kibana.

By the way, do I need to mention my Kubernetes cluster is running on RPI4! If you miss those details, you can find out more from my previous posts on how to install and configure Kubernetes on RPI4.

One of my objectives to analyse the logs is to understand braindose.blog web traffics and to be able to capture the real visitors IPs for further analytics and threats prevention.

For my Kubernetes cluster environment, I have configured HAProxy in front of the Nginx Ingress Controller. With SSL pass-through configured at the HAProxy, I am not able to capture the real IP address from the encrypted traffics. Reconfigure to use SSL termination at HAProxy is not an option for me and I do not wish to go through the nightmare to change things that have been working fine at the moment. Do not fix if not broken!

So the only possible solution is to capture and analyze the Nginx Ingress Controller logs.

Configure Nginx Ingress Controller Custom Log Format

Since I need to capture the real visitors IP addresses, I need to be able to capture the X-Forwarded-For HTTP header in the Nginx Ingress Controller log. To do this, I need to change the configuration in the Nginx Ingress Controller configmap.

This is done by first issuing the following command to edit the configmap.

kubectl edit cm ingress-nginx-controller -n ingress-nginx

We proceed to modify the configmap to include the following Nginx configuration under the data stanza.

Note that the custom log format is almost the same as the default log format except that we need to add $http_x_forwarded_for field to insert the X-Forwarded-For HTTP header into the log entries. This will give us the actual visitors IP address. In the following example, we insert this as the first field in the log entries.

The settings in the example below are derived from Nginx Module called ngx_http_realip_module. Please do not forget to enter the actual IP for your web proxy at the set-real-ip-from variable. In this case it is my HAProxy IP address.

data:
    # omitted lines before ... 

    log-format-upstream: $http_x_forwarded_for $remote_addr - $remote_user [$time_local]
        "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length
        $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr
        $upstream_response_length $upstream_response_time $upstream_status $req_id
    proxy-protocol: "True"
    real-ip-header: proxy_protocol
    set-real-ip-from: <haproxy-ip-address>

    # omitted lines after ...

The following shows the example of the custom log format based on the above configuration. 70.50.160.124 is the real visitor IP address.

70.50.160.124 10.244.7.1 - user [15/Jul/2022:16:44:56 +0000] "GET /ocs/v2.php/apps/user_status/api/v1/user_status?format=json HTTP/2.0" 200 150 "-" "Mozilla/5.0 (Macintosh) mirall/3.5.2git (build 10815) (Nextcloud, osx-21.5.0 ClientArchitecture: arm64 OsArchitecture: arm64)" 409 0.890 [nextcloud-9090] [] 10.244.4.26:80 150 0.888 200 4d389b27-4e6d-4091-849c-f05b14deb1e0

Once we have these real IP addresses, we will use them to help us to identify our website visitors locations using Fluent Bit GeoIP2 filter. We will cover more detail on this later.

Configure Kibana for Nginx Custom Log Format

As mentioned, we are going to use Fluent Bit GeoIP2 filter to translate the IP addresses into Geolocation coordinates. Before we proceed to capture and stream the Nginx Ingress Controller log to the ElasticSearch, we need to configure ElasticSearch Index to be able to convert the coordinates (in the form of latitude and longitude in float numbers) into Kibana’s geo_point data format. We can do this via the Field Mapping provided by Index Templates in Kibana.

Let’s proceed to create the Kibana Index Template.

First, logon to your Kibana web console and browse to Stack Management and select Index Management.

Click on the Create Template button.

Enter the Name for the template and the Index patterns. The index patterns must be something that will match your Nginx ElasticSearch indices names. For my environment configuration, the indices created start with nginx*.

Click Next on the wizard until you reach the Mappings page.

On the Mappings page, we need to create a field mapping to map the coordinates from the log into Geo-point (or geo_point) field type. In this example, we need to use field name same as the log field that we are going to configure later. In this example, we are using coordinates as the field name.

The following show the details on the field mapping configuration.

*Kibana Index Templates Field Mapping for Geo-point*

We also need to enter the Index Aliases in the next screen.

Proceed to the next screen to review the changes and click Create templates button to complete it.

This is what you will have on your Index Management screen after a successful Index Template creation.

We can now proceed to configure and deploy Fluent Bit onto Kubernetes in the next section.

Deploy Fluent Bit on Kubernetes

In this section, we will perform a number of configurations such as Fluent Bit input, parsers, filters and output plugins. I will skip the explanation of the configuration for Kubernetes namespaces, ServiceAccount and ClusterRole in this post. You can refer to these details in the YAML file.

Fluent Bit Configuration

We need to provide the various Fluent Bit plugins configure as a configmap.

First, we will define the main configuration as per the following. This main configuration includes additional configurations for the input, parsers, filters and outputs.

fluent-bit.conf: |
    [SERVICE]
      Flush         5
      Log_Level     info
      Daemon        off
      Parsers_File  parsers.conf
      HTTP_Server   off
      HTTP_Listen   0.0.0.0
      HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE filter-geoip2.conf
    @INCLUDE filter-nest.conf
    @INCLUDE filter-record-modifier.conf
    @INCLUDE output-elasticsearch.conf
    #@INCLUDE output-stdout.conf         # uncomment this to view output in stdout & comment out output for es

In the above, the parsers are configured in the file named parsers.conf via the property named Parsers_File.

  parsers.conf: |

    [PARSER]
      # https://rubular.com/r/V3W1DWyv5uFCfh
      Name        k8s-nginx-ingress
      Format      regex
      Regex       ^(?<real_client_ip>[^ ]*) (?<host>[^ ]*) - (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)" (?<request_length>[^ ]*) (?<request_time>[^ ]*) \[(?<proxy_upstream_name>[^ ]*)\] (\[(?<proxy_alternative_upstream_name>[^ ]*)\] )?(?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) (?<reg_id>[^ ]*).*$
      #Regex       ^(?<host>[^ ]*) - (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)" (?<request_length>[^ ]*) (?<request_time>[^ ]*) \[(?<proxy_upstream_name>[^ ]*)\] (\[(?<proxy_alternative_upstream_name>[^ ]*)\] )?(?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) (?<reg_id>[^ ]*).*$
      Time_Key    time
      Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
      # http://rubular.com/r/tjUt3Awgg4
      Name        cri
      Format      regex
      # XXX: modified from upstream: s/message/log/
      Regex       ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
      Time_Key    time
      Time_Format %Y-%m-%dT%H:%M:%S.%L%z

    [PARSER]
      Name    catchall
      Format  regex
      Regex   ^(?<message>.*)$

The parsers.conf is configured with the following plugins definitions.

k8s-nginx-ingress – Implemented as regex plugin to parse the custom Nginx Ingress Controller log. This parser is called by Pods that are annotated with “fluentbit.io/parser”: “k8s-nginx-ingress”
cri – Implemented as regex plugin to parse ContainerD log format. In this case, my Kubernetes cluster is using ContainerD container engine.
catchall – A specific regex implementation to process Kubernetes logs and Nginx logs. We need to provide a simple tweak using this parser at the Kubernetes filter to make sure logs from Kubernetes and Nginx can be parsed into combined log entry. A reported issue and solution is described in the GitHub here.

The following define Fluent Bit input using tail plugin. In here we are using cri parser we defined previously to properly parse the Nginx Ingress Controller log defined by the Path variable.

input-kubernetes.conf: |
    [INPUT]
      Name              tail
      Tag               nginx.*
      Path              /var/log/containers/ingress-nginx-controller*.log
      Parser            cri
      DB                /var/log/flb_kube.db
      Mem_Buf_Limit     5MB
      Skip_Long_Lines   On
      Refresh_Interval  10

The following is the definition of Kubernetes filter. We need to define this filter so that we are able to parse the Nginx log using the k8s-nginx-ingress parser that we defined earlier. Using this filter, we can also capture the information of the PODs and relevant labels and metadata. More details will be revealed at later part.

filter-kubernetes.conf: |
    [FILTER]
      Name                kubernetes
      Match               nginx.*
      Kube_URL            https://kubernetes.default.svc:443
      Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
      Kube_Tag_Prefix     kube.var.log.containers.
      Merge_Log           On
      K8S-Logging.Parser  On
      K8S-Logging.Exclude On
      Merge_Parser        catchall
      Keep_Log            Off
      Labels              Off   # I purposely turn off this because I do not need it
      Annotations         Off   # I purposely turn off this because I do not need it

The following is the GeoIP2 filter that will help us to map IP address to country, city and latitude & longitude.

filter-geoip2.conf: |
    [FILTER]
      Name geoip2
      Match nginx.*
      Database ${GEOIP2_DB}  # Defined as environmental variable in DaemonSet
      Lookup_key real_client_ip
      Record country_name real_client_ip %{country.names.en}
      Record country_code real_client_ip %{country.iso_code}
      Record city real_client_ip %{city.names.en}
      Record coord.lat real_client_ip %{location.latitude}
      Record coord.lon real_client_ip %{location.longitude}

In order to create the data structure that can be converted into Kibana geo_point data type using the Kibana Index Template that we created earlier, we need to convert the latitude and longitude under a field named coordinates. This is done by nesting these fields under the coordinates field. This is achieved using the Nest filter at the following.

filter-nest.conf: |
    [FILTER]
      Name nest
      Match nginx.*
      Operation nest
      Wildcard coord.*
      Nest_under coordinates
      Remove_prefix coord.

The converted Fluent Bit JSON log format will look similar to the following. Note that it is mandatory to use the name of lat and lon (default names required by the field mapping) . These are defined in filter-geoip2.conf as coord.lat and coord.lon. We removed the prefix coord using the Remove_prefix.

"coordinates"=>{"lat"=>2.993500, "lon"=>101.745000}

Note: I have tried to use Kibana Runtime Field to transform the lat and lon fields to geo_point but it seems not supported by Kibana Maps. Here is my reported issue in the Kibana Github.

In order to reduce the log size sent over to the ElasticSearch, I decided to remove all non-necessary details from the log. I am using record_modifier filter at the following to remove Kubernetes field, stream field, logtag field and reg_id field which they are non-essential for my log analytics.

filter-record-modifier.conf: |
    [FILTER]
      Name record_modifier
      Match nginx.*
      Remove_key kubernetes
      Remove_key stream
      Remove_key logtag
      Remove_key reg_id

Finally I defined 2 output plugins here.

output-elasticsearch.conf defines ElasticSearch output plugin where and how to send the log. ${ELASTICSEARCH_HOST} and ${ELASTICSEARCH_POST} is the environmental variables defined in DeamonSet configuration which we will cover later.

output-stdout.conf is used when I need to troubleshoot or test the configurations. I can comment and uncomment this in the fluent-bit.conf.

Note that we need to configure Suppress_Type_Name to overcome the error that we will encounter with ElasticSeach v8.

  output-stdout.conf: |
    [OUTPUT]
      name  stdout
      match *

  output-elasticsearch.conf: |
    [OUTPUT]
      Name            es
      Match           nginx.*
      Host            ${ELASTICSEARCH_HOST}
      Port            ${ELASTICSEARCH_PORT}
      Logstash_Prefix nginx-k8s
      Logstash_Format On
      Replace_Dots    On
      Retry_Limit     False
      tls.verify      off        # turn off tls verification for self-signed cert
      tls             on
      HTTP_User       elastic
      HTTP_Passwd     <password>
      Trace_Error     On
      Trace_Output    Off
      Suppress_Type_Name  On

Note: You should use Kubernetes secret for the HTTP_Passwd in the above configuration. You can pass the secret as environmental variable in the Kubernetes yaml config.

Warning: elastic is a superuser and should not be used for Kibana and ElasticSearch integration. There are number of built-in Kibana roles that you can used to create an user for this integration purpose.

You may notice we never define how to invoke the k8s-nginx-ingress parser. This is where we need to annotate the Nginx Ingress Controller PODs. Instead of doing this at the POD level, we define this annotation at the Kubernetes Deployment CRD. We can do this by the following command.

The annotation specifically instruct the Fluent Bit Kubernetes Filter to use the k8s-nginx-ingress parser for Nginx Ingress Controller Pods.

kubectl patch deployment ingress-nginx-controller -n ingress-nginx --patch '{"spec": { "template": { "metadata": {"annotations": {"fluentbit.io/parser": "k8s-nginx-ingress" }}}}}' -n ingress-nginx

Deploying Fluent Bit to Kubernetes

In this section, we are looking at how to deploy Fluent Bit as a DaemonSet into Kubernetes. DaemonSet ensures that the Fluent Bit POD will be deployed into each node in your Kubernetes cluster.

I have designated “infra” nodes for Nginx Ingress Controller and because I only need to capture Nginx log for now and for that I just need to deploy Fluent Bit Pods into my “infra” nodes.

In fact I do not need to use DaemonSet and I can just deploy Fluent Bit as standard POD deployment and apply nodeAffinity to ensure the Fluent Bit PODs only deployed to “infra” nodes.

However let’s just stick to DaemonSet for now. I believe using DaemonSet is future proof if I need to capture additional logs other than Nginx later.

Let’s look at some of the YAML configuration details.

The following are some of the environmental variables and volumes that we need to configure for Fluent Bit.

        env:
        - name: ELASTICSEARCH_HOST
          value: "elasticsearch.internal"
        - name: ELASTICSEARCH_PORT
          value: "9200"
        - name: GEOIP2_DB
          value: "/geoip2-db/GeoLite2-City.mmdb"
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: geoip2-db
          mountPath: /geoip2-db
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/

From the above, you may notice we need to configure the location of the GeoIP2 database file via the GEOIP2_DB variable. This is the free version of GeoIP2 database file that you can download from MaxMind official website. I am using persistence volume (PV) here so that I only need to copy this file once into the PV.

I will only be able to copy the database file after the PV is created during Pod initialization. So please be expecting a short moment of Pod initiation failure until the file is copied into the PV.

As mentioned earlier, I want to deploy the DaemonSet into the designated “infra” nodes. So the following nodeAffinity definition does the trick.

      tolerations:
      - key: type
        operator: Equal
        value: infra
        effect: NoSchedule
      affinity: 
        nodeAffinity: 
          requiredDuringSchedulingIgnoredDuringExecution: 
            nodeSelectorTerms:
            - matchExpressions:
              - key: node/type
                operator: In
                values: 
                - infra

You can refer to the complete YAML file at the GitHub here.

Let’s proceed to deploy the Fluent Bit using the following command

kuberctl apply -f nginx-fluent-bit.yaml

The following is the sample output at the console when stdout output is used to verify the logs are captured and parsed as per our expectation.

[0] nginx.var.log.containers.ingress-nginx-controller-774657884f-zd9kp_ingress-nginx_controller-c245264bb26545c58a93df547373156c65d8e0ce792e5b359413556956e746c8.log: [1657969851.000000000, {"real_client_ip"=>"60.45.151.147", "host"=>"10.244.6.1", "user"=>"-", "method"=>"POST", "path"=>"/admin/admin-ajax.php", "code"=>"200", "size"=>"47", "referer"=>"https://braindose.blog/admin/network/plugins.php?plugin_status=inactive", "agent"=>"Mozilla/xxx (Macintosh; Intel Mac OS X) AppleWebKit/xxx (KHTML, like Gecko) Version/xxx Safari/xxx", "request_length"=>"1307", "request_time"=>"1.158", "proxy_upstream_name"=>"wordpress-9081", "upstream_addr"=>"10.244.5.16:80", "upstream_response_length"=>"47", "upstream_response_time"=>"1.164", "upstream_status"=>"200", "reg_id"=>"6df3629d2c607b43eb4f5fd26d27f243", "country_name"=>"Malaysia", "country_code"=>"MY", "city"=>"Somewhere in MY", "coordinates"=>{"lat"=>2.983400, "lon"=>101.785400}}]

Performing Quick Log Analytics

Once you have the Fluent Bit collecting the logs into ElasticSearch you can proceed to view and analyze the log in the Kibana.

First, let’s create a data view for the Nginx indices that are created by Fluent Bit output plugin. The example below shows multiple indices already present over a few days for my environment.

*Create Data View for Nginx Indices in the Kibana*

The following shows the logs view in the Kibana Discover page

*Nginx Logs Analytics on Kibana Discover View*

This is the Map view for the all visitors from all over the world to my braindose.blog over 2.5 days.

*Visitors Country Map based on Nginx Log*

Summary

We have gone though how to quickly deploy ElasticSearch and Kibana using Docker Compose in previous post. We also going through the details of how to configure and deploy Fluent Bit on Kubernetes to capture and analyse the Nginx Ingress Controller logs in this post. I only cover a small portion of what Elastic Stack can do here and there are many other Elastic Stack’s capabilities yet to explore. However, I hope this post provides a good start for your log analytics journey using Elastic Stack.

Monitor and Analyze Nginx Ingress Controller Logs on Kubernetes using ElasticSearch and Kibana

Table of Contents

Overview

Configure Nginx Ingress Controller Custom Log Format

Configure Kibana for Nginx Custom Log Format

Deploy Fluent Bit on Kubernetes

Fluent Bit Configuration

Deploying Fluent Bit to Kubernetes

Performing Quick Log Analytics

Summary

Leave a Reply Cancel reply

Table of Contents

Overview

Configure Nginx Ingress Controller Custom Log Format

Configure Kibana for Nginx Custom Log Format

Deploy Fluent Bit on Kubernetes

Fluent Bit Configuration

Deploying Fluent Bit to Kubernetes

Performing Quick Log Analytics

Summary

Related posts:

Leave a Reply Cancel reply