For a way better use of disk space and memory i use the TokuDB storage engine (available on MariaDB and Percona MySQL) which compresses the datas on disk and memory and its use of fractal indexes instead of classical B-Tree ones gives very good result on huge tables (with millions of rows).
I use it for the "history", "history_uint", "trends" and "trends_uint" tables which in my case have more than 316 milion rows, 430 million rows, 14 milion rows and 51 milion rows on my main Zabbix server.
I dont remember the exact figures but using TokuDB default zlib compression made these tables between 3 and 10 times smaller than using InnoDB and at least 1.5 times smaller than InnoDB with "row_format=compressed" (which was also way slower).
Other available compression types :
TokuDB System Variables
Please note that the "tokudb_cache_size" variable which is the equivalent of the "innodb_cache_size" for InnoDB has a default value of 50% of the system memory, which is rarely optimum.
I also use these specific settings :
- tokudb_directio=1 # bypass the kernel file cache, equivalent to "innodb_flush_method=O_DIRECT" for InnoDB
- tokudb_commit_sync = 0 # instead of committing after each transaction to disk, does it asynchronously, limits the IO but adds the risk of loosing, in case of a crash, the transaction(s) committed since the last flush, equivalent to "innodb_flush_log_at_trx_commit=2" for InnoDB
- tokudb_fsync_log_period = 1000 # When "tokudb_commit_sync" is set to 0, force fsync() to be called at least every X milliseconds, equivalent to "innodb_flush_log_at_timeout=1" for InnoDB
--
As for the database that keeps growing, dont forget to :
- Check the "Housekeeping" parameters, if you dont manually clean the database (by using table partitions and/or scripts/events), the internal housekeeping should be active
- Most of the templates check too often values, for example, most users don't need to check every 30 seconds the bandwidth or the load of a server, once per minute (or even less) could suffice ; no need to retrieve every minutes the total memory capacity or every hour the disk capacity of a volume as its unlikely to change in that window (for these values prefer to use a short history length)
- Don't get values that are not useful or that are redundant, for example, no need to retrieve the 3 load average values, only retrieve the 1 minute load average and if needed put a trigger on the last X minutes of this value or don't retrieve the number of processes on the system if its of no use ; getting free space in % and in bytes of a volume might be redundant
- Don't put the "history" setting too high on items as every values retrieved will be kept for that duration (use trends if you don't need the precision after a certain time) : Keeping an history for 1 month on an item checked every minutes is about 45000 values, if you check this value on 25 systems, you end up with ~1,1 million values stored on the database after that month ; if for example you check the bandwidth usage, the usual way is to get separately the incoming and outgoing traffic, thus doubling the number of values to retrieve the bandwidth usage for an interface on a system and there could be more than a network interface (and don't forget to disable the loopback interface (lo) monitoring as its usually not giving any interesting informations)