Why Analyze Raw MySQL Query Logs: Benefits and Best Practices

dermemetesua
Aug 17, 2023
6 min read

You can use mysqlbinlog to read binary log files directly and apply them to the local MySQL server. You can also read binary logs from a remote server by using the --read-from-remote-server option. To read remote binary logs, the connection parameter options can be given to indicate how to connect to the server. These options are --host, --password, --port, --protocol, --socket, and --user.

Why Analyze Raw MySQL Query Logs

Download

These are just some of the examples of what you can find out by querying raw slow query logs. They contain a ton of information about query execution (especially in Percona Server for MySQL) that allows you to use them both for performance analysis and some security and auditing purposes.

pt-query-digest analyzes MySQL queries from slow, general, and binary logfiles. It can also analyze queries from SHOW PROCESSLIST and MySQLprotocol data from tcpdump. By default, queries are grouped by fingerprintand reported in descending order of query time (i.e. the slowest queriesfirst). If no FILES are given, the tool reads STDIN. The optionalDSN is used for certain options like --since and --until.

pt-query-digest is a sophisticated but easy to use tool for analyzingMySQL queries. It can analyze queries from MySQL slow, general, and binarylogs. (Binary logs must first be converted to text, see --type).It can also use SHOW PROCESSLIST and MySQL protocol data from tcpdump.By default, the tool reports which queries are the slowest, and thereforethe most important to optimize. More complex and custom-tailored reportscan be created by using options like --group-by, --filter, and--embedded-attributes.

Query analysis is a best-practice that should be done frequently. Tomake this easier, pt-query-digest has two features: query review(--review) and query history (--history). When the --reviewoption is used, all unique queries are saved to a database. When thetool is ran again with --review, queries marked as reviewed inthe database are not printed in the report. This highlights new queriesthat need to be reviewed. When the --history option is used,query metrics (query time, lock time, etc.) for each unique query aresaved to database. Each time the tool is ran with --history, themore historical data is saved which can be used to trend and analyzequery performance over time.

There are a number of tools to load the MySQL slow query logs to a variety of data stores. For example, you can find posts showing how to do it with LogStash. While very flexible, these solutions always look too complicated and limited in functionality to me.

By far the best solution to parse and load MySQL slow query logs (among multiple log types supported) is Charity Majors Honeytail. It is great self-contained tool written in Go that has excellent documentation and is very easy to get started with. The only catch is it is only designed to work with SaaS log monitoring platform HoneyComb.io.

You need a tool to sift through the slow query log to get those statistics, and Percona has just the tool for it: pt-query-digest. This tool has many other tricks up its sleeve, but for this post, I just want to cover how it helps me analyze and summarize slow query logs so I can quickly dig into the worst queries that might be bringing down my production application or Drupal or other PHP-based website.

In addition to auditing DDL operations, EventLog Analyzer monitors and analyzes DML actions, such as select, insert, delete, and update, that were executed in your database. EventLog Analyzer's exhaustive reports detail the query executed, when it occurred, and the number of times it was executed.

Collect, parse, and analyze Apache web server logs efficiently with EventLog Analyzer. Enhance your network security with in-depth analytical reports and receive alerts immediately via email or SMS when a security threat is detected on your Apache server.

MySQL log files are a security administrator's best friend. Whether it is an unintentional error, a security breach, or a system crash, logs can provide answers. MySQL has several log types that provide insights into different occurrences on the MySQL server. The error log and the query log are the most important ones that should be added to your monitoring list.

To start logging MySQL query logs, open the terminal and then open the /etc/scalyr-agent-2/agent.json file. Then add the DBMS username and password under the monitor section as follows:

There is a number of similar tools available already, that can parse logs and put them into some data stores, for example, Logstash, HoneyComb, ApacheLogsParser, etc. So why another one? ClickTail does not only use ClickHouse as the data store so the log data analysis works blazingly fast, but it also does a job of normalizing text data from the log files such as sql query or URL in respective log formats. Normalizing means replacing the actual queries/urls with a pattern which could be used as a filter or group by clause afterward.

It is often the case that web applications face suspicious activities due to various reasons, such as a kid scanning a website using an automated vulnerability scanner or a person trying to fuzz a parameter for SQL Injection, etc. In many such cases, logs on the webserver have to be analyzed to figure out what is going on. If it is a serious case, it may require a forensic investigation.

The general query log logs established client connections and statements received from clients. As mentioned earlier, by default these are not enabled since they reduce performance. We can enable them right from the MySQL terminal or we can edit the MySQL configuration file as shown below.

Oftentimes, the root cause of slowdowns, crashes, or other unexpected behavior in MySQL can be determined by analyzing its error logs. On Ubuntu systems, the default location for the MySQL is /var/log/mysql/error.log. In many cases, the error logs are most easily read with the less program, a command line utility that allows you to view files but not edit them:

A common application is to use CloudTrail logs to analyze operational activity for security and compliance. For information about a detailed example, see the AWS Big Data Blog post, Analyze security, compliance, and operational activity using AWS CloudTrail and Amazon Athena.

You can create a non-partitioned Athena table for querying CloudTrail logs directly from the CloudTrail console. Creating an Athena table from the CloudTrail console requires that you be logged in with a role that has sufficient permissions to create tables in Athena.

Because CloudTrail logs have a known structure whose partition scheme you can specify in advance, you can reduce query runtime and automate partition management by using the Athena partition projection feature. Partition projection automatically adds new partitions as new data is added. This removes the need for you to manually add partitions by using ALTER TABLE ADD PARTITION.

The following example CREATE TABLE statement automatically uses partition projection on CloudTrail logs from a specified date until the present for a single AWS Region. In the LOCATION and storage.location.template clauses, replace the bucket, account-id, and aws-region placeholders with correspondingly identical values. For projection.timestamp.range, replace 2020/01/01 with the starting date that you want to use. After you run the query successfully, you can query the table. You do not have to run ALTER TABLE ADD PARTITION to load the partitions.

The following example shows a portion of a query that returns all anonymous (unsigned) requests from the table created for CloudTrail event logs. This query selects those requests where useridentity.accountid is anonymous, and useridentity.arn is not specified:

Before querying the logs, verify that your logs table looks the same as the one in Creating a table for CloudTrail logs in Athena using manual partitioning. If it is not the first table, delete the existing table using the following command: DROP TABLE cloudtrail_logs.

In order to analyze every query in your list, create and run a script. The following is an example Python script that assumes a sharded database with 4 shards. You can adjust this script to match your individual requirements.

However, if you only want a record of queries that change data, it might be better to use the binary log instead. One important difference is that the binary log only logs a query when the transaction is committed by the server, but the general query log logs a query immediately when it is received by the server.

Another way to configure the general query log filename is to set the log-basename option, which configures MariaDB to use a common prefix for all log files (e.g. general query log, slow query log, error log, binary logs, etc.). The general query log filename will be built by adding a .log extension to this prefix. This option cannot be set dynamically. It can be set in a server option group in an option file prior to starting up the server. For example:

The general query log can either be written to a file on disk, or it can be written to the general_log table in the mysql database. To choose the general query log output destination, set the log_output system variable.

The general query log can either be written to the general_log table in the mysql database by setting the log_output system variable to TABLE. It can be changed dynamically with SET GLOBAL. For example:

Next, include the following information in mysql.d/conf.yaml to configure the Agent to collect MySQL error, general, and slow query logs, based on the file paths specified in your MySQL configuration file:

In this example, we defined a custom log processing rule that instructs Datadog to report each multi-line slow query log as a single entry, rather than spreading it across multiple entries. Datadog scans your raw logs for the specified pattern and aggregates all subsequent lines into a single log message until it encounters the pattern again. 2ff7e9595c

Why Analyze Raw MySQL Query Logs: Benefits and Best Practices

Why Analyze Raw MySQL Query Logs

Recent Posts

Comments