I’m currently working on a database which stores, literally, a million of new data registers every day. They come from several hundred weather stations spread along the world, and need to be processed on real time. That’s a huge amount to be managed with a simple – but powerful – MySQL server, so I’ve had to design and implement a few tricks in order to succeed.
In this post I’d like to talk about the main three techniques we’ve used to engage our high performance database, and the reasons to choose these solutions.
First of all, MySQL innodb tables start degrading at some million registers (7-10M on my current server, but not tested further), so the first though I had was: we need to divide if we want to conquer. Then, after some thinking about, I decided to split the big table into small chunks, and use a table for each station. That made the whole problem more affordable, and tables smaller enough not to become scary: the new problem was how to store about 5 M registers per year per table in our database.
Of course, some logic in both the application controller and the data input server was needed to correctly place data registers. Additionally, tables get often fragmented so some cron jobs were needed to keep the database running smoothly. But there was still the problem of data growing. As long as registers were separated depending on the station, the scalability problem was solved: more tables can be added, even more servers with ranges of stations and a hash function to calculate what server is a given station hosted in. Once again were we talking about several million registers per year per table, and with a standart MySQL 5.1 server (the main problem with an start-up company tends to be funding, so we didn’t have anything better).