Archive for the 'open source' Category

Xml/sql dumps and MediaWiki-Vagrant, two great tastes that taste great together?

Problems, problems, problems

Recently a colleague asked me to look over a patchset in gerrit that would add some new functionality to a set of weekly dumps, and, as is my wont, I asked if he’d tested the patch. The answer was, “Well, no, just parts of it”. When I dug into the issue a little deeper, it turned out that the reason the script hadn’t been tested is that there was no easy way to do so!

Enter MediaWiki-Vagrant. [1] This lets you set up a virtual machine on your laptop with the latest and greatest version of MediaWiki. By the simple application of puppet roles, you can add multiple wikis and your own skeletal copy of Wikidata for testing. This seemed like the perfect place to add a dumps role.

Adam Wight started working on such a role in April of 2017. [2] We’re all busy people so it took a while, but finally a few weeks ago the role was merged. It lets the user do a basic test of the xml/sql dumps against whatever’s in the master branch of MediaWiki. But it doesn’t allow my colleague to test his script changes. That, it turns out, is complicated.

So, without further ado, here is what I did in order to get tests up and running in a setup that permits xml/sql dumps to run, as well as tests of ‘miscellaneous’ dump scripts such as category dumps or the ‘Wikidata weeklies’.

MediaWiki-Vagrant on Fedora

Fedora is my distro of choice, so there was special prep work for the installation of MediaWiki-Vagrant.

1. I needed libvirt and lxc; I got these by

dnf install vagrant, vagrant-libvirt, vagrant-lxc, vagrant-lxc-doc,
   lxc-libs, lxc, lxc-templates, lxc-extra, nfs-utils redir

2. added myself to the /etc/sudoers file:

meeee ALL=(ALL) ALL

3. edited /etc/lxc/default.conf: = lxcbr0 = virbr0

4. fixing up the firewall:

firewall-cmd --permanent --zone public --add-port=20048/udp
firewall-cmd --permanent --zone public --add-port=111/udp
firewall-cmd --permanent --zone public --add-port=2049/udp
firewall-cmd --permanent --zone public --add-service mountd
firewall-cmd --permanent --zone public --add-service rpc-bind
firewall-cmd --permanent --zone public --add-service nfs
firewall-cmd --reload

and checking that nfs was indeed in the list by:

firewall-cmd --list-all

5. set up udp for v3, which is vagrant default but is turned off by default in Fedora; this was done by editing /etc/sysconfig/nfs
and changing this line




then restarting the service:

service nfs-server restart

Installing MediaWiki-Vagrant

This was slightly different than the instructions [3], since I’m using the lxc provider.

git clone --recursive
cd vagrant
vagrant config --required (I just just left the name blank at the prompt)
vagrant up --provider lxc --provision

Provisioning the Wikidata role

The Wikidata role needs some special handling. [4] But it needs even more specialness than the docs say. There’s an issue with the Wikibase extension’s composer setup that we need to work around. [5] Here’s all the steps involved.

vagrant git-update
vagrant ssh

These steps are all done from within the VM:

sudo apt-get update
sudo apt-get upgrade
composer selfupdate --update-keys (and enter the keys from
composer config --global process-timeout 9600

Get off the vm, and then:

vagrant roles enable wikidata
vagrant provision

This last fails badly, failing to find a certain class and so everything breaks. Edit {code>mediawiki/composer.local.json and add the line


to the merge-plugin include stanza at the end of the file. Now you can rerun composer and the failed steps:

vagrant ssh
cd /vagrant/mediawiki
rm composer.lock
composer update --lock
sudo apachectl restart
sudo -u www-data \(cd /vagrant/mediawiki; /usr/local/bin/foreachwiki 
    update.php --quick --doshared \)

Import some data!

At this point the installation was working but there was only the Main Page in Wikidatawiki. I needed to get some data in there.

I grabbed the first so many pages from one of the wikidata dumps (~170 pages), put them in an xml file, added a ” tag on the end, and put that in srv/wikidata_pages.xml.

Next I needed to enable entity imports, which is done by creating the file  /vagrant/settings.d/wikis/wikidatawiki/settings.d/puppet-managed/10-Wikidata-entities.php with the contents:

  $wgWBRepoSettings['allowEntityImport'] = true;

Next came the import:

cd /vagrant
cat /vagrant/srv/wikidata_pages.xml | sudo -u www-data mwscript importDump.php
    --wiki=wikidatawiki --uploads --debug --report 10

This took a lot longer than expected (30 minutes for about 170 pages) but did eventually complete without errors. Then some rebuilds:

sudo -u www-data mwscript rebuildrecentchanges.php --wiki=wikidatawiki
sudo -u www-data mwscript initSiteStats.php --wiki=wikidatawiki

Provisioning the dumps role

At last I could cherry-pick my gerrit change [6]. But because by default I’m using nfs on linux for the mount of /vagrant inside the VM,  I needed to add some tweaks that let puppet create some directories in /vagrant/srv owned by the dumps user.

In /vagrant, I created the file Vagrantfile-extra.rb with the following contents:

mwv ='..', __FILE__))
settings = mwv.load_settings

Vagrant.configure('2') do |config|
  if settings[:nfs_shares]
    root_share_options = { id: 'vagrant-root' }
    root_share_options[:type] = :nfs
    root_share_options[:mount_options] = ['noatime', 'rsize=32767', 'wsize=32767', 'async']
    root_share_options[:mount_options] << 'fsc' if settings[:nfs_cache]
    root_share_options[:mount_options] << 'vers=3' if settings[:nfs_force_v3]
    root_share_options[:linux__nfs_options] = ['no_root_squash', 'no_subtree_check', 'rw', 'async']
    config.nfs.map_uid = Process.uid
    config.nfs.map_gid = Process.gid
    config.vm.synced_folder '.', '/vagrant', root_share_options

Then I needed to restart the VM so that the freshly nfs-mounted share would permit chown and chmod from within it:

vagrant halt
vagrant up --provider lxc --provision

After that, I was able to enable the dumps role:

vagrant roles enable dumps
vagrant provision

Wikidata dump scripts setup

Next I had to get all the scripts needed for testing, by doing the following:

  • copy into /usr/local/bin:,, [7]
  • copy into /usr/local/etc: dcatconfig.json [7]
  • copy a fresh clone of operations-dumps-dcat into /usr/local/share [8]

And finally, I had to fix up a bunch of values in the dump scripts that are meant for large production wikis.

dumpNameToMinSize=(["all"]=`expr 2350 / $shards` ["truthy"]=`expr 1400 / $shards`)


if [ $fileSize -lt `expr 20 / $shards` ]; then



and as root, clean up some cruft that has the wrong permissions:

rm -rf /var/cache/mediawiki/*

Running dumps!

For xml/sql dumps:

su -  dumpsgen
cd /vagrant/srv/dumps/xmldumps-backup
python --configfile /vagrant/srv/dumps/confs/wikidump.conf.dumps [name_of_wiki_here]

Some wikis available for ‘name_of_wiki_here’ are: enwiki, wikidatawiki, ruwiki, zhwiki, among others.

For wikidata json and rdf dumps:

su - dumpsgen
mkdir /vagrant/srv/dumps/output/otherdumps/wikidata/
/usr/local/bin/ all ttl
/usr/local/bin/ truthy nt

See how easy that was? 😛 😛

But seriously, folks, we are working on making testing all dumps easy, or at least easier. This brings us one step closer.

Next steps

It’s a nuisance to edit the scripts and change the number of shards and so on; these are being turned into configurable values. A special configuration file will be added to the dumps role that all ‘miscellaneous dumps’ can use for these sorts of values.

It’s annoying to have to copy in the scripts from the puppet repo before testing. We’re considering creating a separate repository operations/dumps/otherdumps which would contain all of these scripts; then a simple ‘git clone’ right from the dumps role itself would add the scripts to the VM.

There are multiple symlinks of the directory containing the php wrapper MWScript.php to different locations, because several scripts expect the layout of the mediawiki and related repos to be the way it’s set up in production. The location should be configurable in all scripts so that it can bepassed in on the command line for testing, and the extra symlinks removed from the dumps role.

The composer workaround will eventually be unnecessary once Wikibase has been fixed up to work with composer the way many MediaWiki extensions do. That’s on folks’ radar already.

The xml file of pages to import into wikidata could be provided in the dumps role and entity imports configured, though the import itself might still be left for the user because it takes so long.

Once the above fixes are in, we’ll probably be starting to move to kubernetes and docker for all testing. 😀


Thanks to: Adam Wight for all the work on the initial dumps role, Stas Malyshev for the composer solution and for being a guinea pig, and the creators and maintainers of MediaWiki-Vagrant for making this all possible.


Docker and Salt Redux

Recently I was digging into salt innards again; that meant it was time to dust off the old docker salt-cluster script and shoehorn a few more features in there.

Salt up close and personal.

NaCl up close and personal.

There are some couples that you just know ought to get themselves to a relationship counselor asap. Docker and SSHD fall smack dab into that category. [1]  When I was trying to get my base images for the various Ubuntu distros set up, I ran into issues with selinux, auditd and changed default config options for root, among others. The quickest way to deal with all these annoyances is to turn off selinux on the docker host and comment the heck out of a bunch of things in various pam configs and the sshd config.

The great thing about Docker though is that once you have your docker build files tested and have created your base images from those, starting up containers is relatively quick. If you need a configuration of several containers from different images with different things installed you can script that up and then with one command you bring up your test or development environment in almost no time.

Using this setup, I was able to test multiple combinations of salt and master versions on Ubuntu distros, bringing them up in a minute and then throwing them away when done, with no more concern than for tossing a bunch of temp files. I was also able to model our production cluster (running lucid, precise and trusty) with the two versions of salt in play, upgrade it, and poke at salt behavior after the upgrade.

A good dev-ops is a lazy dev-ops, or maybe it’s the other way around. Anyways, I can be as lazy as the best of ’em, and so when it came to setting up and testing the stock redis returner on these various salt and ubuntu versions, that needed to be scriptified too; changing salt configs on the fly is a drag to repeat manually. Expect, ssh, cp and docker ps are your best friends for something like this. [2]

In the course of getting the redis stuff to work, I ran across some annoying salt behavior, so before you run into it too, I’ll explain it here and maybe save you some aggravation.

The procedure for setting up the redis returner included the following bit:

– update the salt master config with the redis returner details
– restart the master
– copy the update script to the minions via salt

This failed more often than not, on trusty with 2014.1.10. After these steps, the master would be seen to be running, the minions were running, a on all the minions came back showing them all responsive, and yet… no script copy.

The first and most obvious thing is that the salt master restart returns right away, but the master is not yet ready to work. It has to read keys, spawn worker threads, each of those has to load a pile of modules, etc.  On my 8-core desktop, for 25 workers this could take up to 10 seconds.

Salt works on a sub/pub model [3], using ZMQ as the underlying transport mechanism for this. There’s no ack from the client; if the client gets the message, it runs the job if it’s one of the targets, and returns the results. If the client happens to be disconnected, it won’t get the message. Now salt minions do reconnect if their connection goes away but this takes time.

Salt (via ZMQ) also encrypts all messages. Upon restart, the master generates a new AES key, but the minions don’t learn about this til they receive their first message, typically with some job to run. They will try to use the key they had lying round from a minute ago to decrypt, fail, and then be forced to try again. But this retry takes time. And while the job will eventually be run and the results sent back to the master, your waiting script may have long since given up and gone away.

With the default salt config, the minion reconnect can take up to 5 seconds. And the minion re-auth retry can take up to 60 seconds. Why so long? Because in a production setting, if you restart the master and thousands of minions all try to connect at once, the thundering herd will kill you. So the 5 seconds is an upper limit, and each minion will wait a random amount of time up to that upper limit before reconnect. Likewise the 60 seconds is an upper limit for re-authentication. [4]

This means that after a master restart, it’s best to wait at least 15 seconds before running any job, 10 for master setup and 5 for the salt minion reconnect. This ensures that the salt minion will actually receive the job. (And after a minion restart, it’s best to wait at least 5 seconds before giving it any work to do, for the same reason.)

Then be sure to run your salt command with a nice long timeout of longer than 60 seconds. This ensures that the re-auth and the job run will get done, and the results returned to the master, before your salt command times out and gives up.

Now the truly annoying bit is that, in the name of perfect forward secrecy, an admittedly worthy goal, the salt master will regenerate its key after 24 hours of use, with the default config. And that means that if you happen to run a job within a few seconds of that regen, whenever it happens, you will hit this issue. Maybe it will be a puppet run that sets a grain, or some other automated task that trips the bug. Solution? Make sure all your scripts check for job returns allowing for the possibility that the minion had to re-auth.

Tune in next time for more docker-salt-zmq fun!

[1] Docker ssh issues on github
[2] Redis returner config automation
[3] ZMQ pub/sub docs
[4] Minion re-auth config and Running Salt at scale

Ditching gnome 3 for kde 4

I finally made the switch. I’ve been a long time fan of gnome, critical of kde memory bloat, and not fond of the lack of integration that has haunted kde and its apps for years. But I finally made the switch.

I have an Nvidia graphics card in this three and a half year old laptop on which I run the Nvidia proprietary drivers. Let’s not kid ourselves; in many cases the open source drivers aren’t up to snuff, and this card and laptop is one of those cases. I’m talking about regular use for watching videos, doing my development work and so on, not games, not exotic uses of blender or what have you, nothing out of the ordinary.

Gnome shell has been a memory hog since its inception, with leaks that force the shell to die a horrible death or hang in odd ways after a few days of uptime. Maybe this is caused by interaction with the Nvidia drivers, and maybe not, but it’s a drag.

Nonetheless, it was a drag I was willing to put up with, in the name of ‘use the current technologies, they’ll stabilize eventually’. No, no they won’t. With the latest upgrade to Fedora 20, I noticed a bizarre mouse pointer bug which goes something like this:

Type… typetypetype… woops mouse pointer is gone. Huh, where is it? Try alt-shift-tab to see the window switcher. Ah *whew*, I can at least switch to another window, and now the pointer is back.

Only, that alt-shift-tab trick didn’t always work the first time, and sometimes it didn’t work at all. I was forced often enough to hard power off the laptop (no alternate consoles to switch into, and the system was in hard lockup doing something disk-intensive, who knows what… maybe swapping to death).

After the last round of package updates I started seeing lockups multiple times a day. The bug reporter, on the few times gnome shell would actually segfault, refused to report the bug because it was a dupe and what was I doing using those proprietary drivers anyways.

Usability has a bunch of factors in there, but basic is the ability to use the system without lockup. So… kde 4.11. Five days later I have had no mouse pointer issues, no lockups, no OOM, no swapping. I miss my global emacs key bindings, I couldn’t get gnome terminal to work right because of the random shrinking terminal bug, and the world clock isn’t exactly the way I’d like it but I’ll live with that. Goodbye gnome 3, if you see gnome 4 around some day, my door will be open.