Some tips for those who still administer Drupal7-based sites
A bit of history: Drupal at my workplace (and in Debian)
My main day-to-day responsibility in my workplace is, and has been for 20 years, to take care of the network infrastructure for UNAM’s Economics Research Institute. One of the most visible parts of this responsibility is to ensure we have a working Web presence, and that it caters for the needs of our academic community.
I joined the Institute in January 2005. Back then, our designer pushed static versions of our webpage, completely built in her computer. This was standard practice at the time, and lasted through some redesigns, but I soon started advocating for the adoption of a Content Management System. After evaluating some alternatives, I recommended adopting Drupal. It took us quite a bit to do the change: even though I clearly recall starting work toward adopting it as early as 2006, according to the Internet Archive, we switched to a Drupal-backed site around June 2010. We started using it somewhere in the version 6’s lifecycle.
As for my Debian work, by late 2012 I started getting involved in the
maintenance of the drupal7
package,
and by April 2013 I became its primary maintainer. I kept the drupal7
package
up to date in Debian until ≈2018; the supported build methods for Drupal 8 are
not compatible with Debian (mainly, bundling third-party libraries and updating
them without coordination with the rest of the ecosystem), so towards the end of
2016, I announced I would not package Drupal 8 for
Debian.
By March 2016, we migrated our main page to Drupal 7. By then, we already had several other sites for our academics’ projects, but my narrative follows our main Web site. I did manage to migrate several Drupal 6 (D6) sites to Drupal 7 (D7); it was quite involved process, never transparent to the user, and we did have the backlash of long downtimes (or partial downtimes, with sites half-available only) with many of our users. For our main site, we took the opportunity to do a complete redesign and deployed a fully new site.
You might note that March 2016 is after the release of D8 (November 2015). I don’t recall many of the specifics for this decision, but if I’m not mistaken, building the new site was a several months long process — not only for the technical work of setting it up, but for the legwork of getting all of the needed information from the different areas that need to be represented in the Institute. Not only that: Drupal sites often include tens of contributed themes and modules; the technological shift the project underwent between its 7 and 8 releases was too deep, and modules took a long time (if at all — many themes and modules were outright dumped) to become available for the new release.
Naturally, the Drupal Foundation wanted to evolve and deprecate the old codebase. But the pain to migrate from D7 to D8 is too big, and many sites have remained under version 7 — Eight years after D8’s release, almost 40% of Drupal installs are for version 7, and a similar proportion runs a currently-supported release (10 or 11). And while the Drupal Foundation made a great job at providing very-long-term support for D7, I understand the burden is becoming too much, so close to a year ago (and after pushing several times the D7, they finally announced support will finish this upcoming January 5.
Drupal 7 must go!
I found the following usage graphs quite interesting: the usage statistics for all Drupal versions follows a very positive slope, peaking around 2014 during the best years of D7, and somewhat stagnating afterwards, staying since 2015 at the 25000–28000 sites mark (I’m very tempted to copy the graphs, but builtwith’s terms of use are very clear in not allowing it). There is a sharp drop in the last year — I attribute it to the people that are leaving D7 for other technologies after its end-of-life announcement. This becomes clearer looking only at D7’s usage statistics: D7 peaks at ≈15000 installs in 2016 stays there for close to 5 years, and has a sharp drop to under 7500 sites in the span of one year.
D8 has a more “regular” rise, peak and fall peaking at ~8500 between 2020 and 2021, and down to close to 2500 for some months already; D9 has a very brief peak of almost 9000 sites in 2023 and is now close to half of it. Currently, the Drupal king appears to be D10, still on a positive slope and with over 9000 sites. Drupal 11 is still just a blip in builtwith’s radar, with… 3 registered sites as of September 2024 :-Þ
After writing this last paragraph, I came across the statistics found in the Drupal webpage; the methodology for acquiring its data is completely different: while builtwith’s methodology is their trade secret, you can read more about how Drupal’s data is gathered (and agree or disagree with it 😉, but at least you have a page detailing 12 years so far of reported data, producing the following graph (which can be shared under the CC BY-SA license 😃):
This graph is disgregated into minor versions, and I don’t want to come up with yet another graph for it 😉 but it supports (most of) the narrative I presented above… although I do miss the recent drop builtwith reported in D7’s numbers!
And what about Backdrop?
During the D8 release cycle, a group of Drupal developers were not happy with the depth of the architectural changes that were being adopted, particularly the transition to the Symfony PHP component framework, and forked the D7 codebase to create the Backdrop CMS, a modern version of Drupal, without dropping the known and tested architecture it had. The Backdrop developers keep working closely together with the Drupal community, and although its usage numbers are way smaller than Drupal’s, seems to be sustainable and lively. Of course, as I presented their numbers in the previous section, you can see Backdrop’s numbers in builtwith… are way, way lower.
I have found it to be a very warm and welcoming community, eager to receive new
members. And, thanks to its contributed D2B Migrate
module, I found it is quite easy
to migrate a live site from Drupal 7 to Backdrop.
Migration by playbook!
So… Well, I’m an academic. And (if it’s not obvious to you after reading so far 😉), one of the things I must do in my job is to write. So I decided to write an article to invite my colleagues to consider Backdrop for their D7 sites in Cuadernos Técnicos Universitarios de la DGTIC, a young journal in our university for showcasing technical academical work. And now that my article got accepted and published, I’m happy to share it with you — of course, if you can read Spanish 😉 But anyway…
Given I have several sites to migrate, and that I’m trying to get my colleagues to follow suite, I decided to automatize the migration by writing an Ansible playbook to do the heavy lifting. Of course, the playbook’s users will probably need to tweak it a bit to their personal needs. I’m also far from an Ansible expert, so I’m sure there is ample room fo improvement in my style.
But it works. Quite well, I must add.
But with this size of database…
I did stumble across a big pebble, though. I am working on the migration of one
of my users’ sites, and found that its database is… huge. I checked the
mysqldump
output, and it got me close to 3GB of data. And given the
D2B_migrate
is meant to work via a Web interface (my playbook works around it
by using a client I wrote with Perl’s
WWW::Mechanize), I repeatedly
stumbled with PHP’s maximum POST size, maximum upload size, maximum memory
size…
I asked for help in Backdrop’s Zulip chat site, and my attention was taken off fixing PHP to something more obvious: Why is the database so large? So I took a quick look at the database (or rather: my first look was at the database server’s filesystem usage). MariaDB stores each table as a separate file on disk, so I looked for the nine largest tables:
# ls -lhS|head
total 3.8G
-rw-rw---- 1 mysql mysql 2.4G Dec 10 12:09 accesslog.ibd
-rw-rw---- 1 mysql mysql 224M Dec 2 16:43 search_index.ibd
-rw-rw---- 1 mysql mysql 220M Dec 10 12:09 watchdog.ibd
-rw-rw---- 1 mysql mysql 148M Dec 6 14:45 cache_field.ibd
-rw-rw---- 1 mysql mysql 92M Dec 9 05:08 aggregator_item.ibd
-rw-rw---- 1 mysql mysql 80M Dec 10 12:15 cache_path.ibd
-rw-rw---- 1 mysql mysql 72M Dec 2 16:39 search_dataset.ibd
-rw-rw---- 1 mysql mysql 68M Dec 2 13:16 field_revision_field_idea_principal_articulo.ibd
-rw-rw---- 1 mysql mysql 60M Dec 9 13:19 cache_menu.ibd
A single table, the access log, is over 2.4GB long. The three following tables
are, cache tables. I can perfectly live without their data in our new site! But
I don’t want to touch the slightest bit of this site until I’m satisfied with
the migration process, so I found a way to exclude those tables in a
non-destructive way: given D2B_migrate
works with a mysqldump
output, and
given that mysqldump
locks each table before starting to modify it and unlocks
it after its job is done, I can just do the following:
$ perl -e '$output = 1; while (<>) { $output=0 if /^LOCK TABLES `(accesslog|search_index|watchdog|cache_field|cache_path)`/; $output=1 if /^UNLOCK TABLES/; print if $output}' < /tmp/d7_backup.sql > /tmp/d7_backup.eviscerated.sql; ls -hl /tmp/d7_backup.sql /tmp/d7_backup.eviscerated.sql
-rw-rw-r-- 1 gwolf gwolf 216M Dec 10 12:22 /tmp/d7_backup.eviscerated.sql
-rw------- 1 gwolf gwolf 2.1G Dec 6 18:14 /tmp/d7_backup.sql
Five seconds later, I’m done! The database is now a tenth of its size, and
D2B_migrate
is happy to take it. And I’m a big step closer to finishing my
reliance on (this bit of) legacy code for my highly-visible sites 😃