How to open XML files in LibreOffice from the WHO ICTRP

The WHO provides a “Search Portal” for the International Clinical Trials Registry Platform at the following address:

Try it out! Do a search for “ixabepilone AND endometrial” for example. This will return a table of 4 clinical trials (or it did on 2016 Sept 21) with a red button at the top right that says “Export results to XML.” If you click this button (then “I agree,” then “Export all trials to XML”) your browser will save an XML file with the results in it.

Using LibreOffice to parse these data

LibreOffice Calc XML import
LibreOffice Calc XML import

LibreOffice has a great tool for importing XML information, like the sort you just downloaded from the WHO database.

Open LibreOffice Calc, then click on Data > XML Source …

But there’s a problem

If you navigate to where your browser saved the XML file and choose it as the “Source File,” you’ll have a problem. The “Map to Document” field won’t be populated by elements from the XML file as expected.

I reported the bug to the LibreOffice people, but in the meantime …

Here’s a workaround

Open up the XML file in a plain text editor (Notepad, Atom, Scratch, etc.).

The first line will look like this:

<?xml version=’1.0′ encoding=’UTF-8′ ?><Trials_downloaded_from_ICTRP>

Just change that line so that it looks like this:

<?xml version=”1.0″ encoding=”UTF-8″ ?><Trials_downloaded_from_ICTRP>

The only difference here is that the single quotes are replaced with double quotes. LibreOffice will read the file as expected now!

Click where it says “Trial” under “Trials_downloaded_from_ICTRP” in the “Map to Document” field, choose A1 as the “Mapped cell,” and click the Import button!

You can now read through WHO ICTRP files on LibreOffice Calc.

Unboxing elementary OS Loki

On Friday, the latest version of elementary OS was released: version 0.4, Loki. I’ve been using elementary since Freya public beta 2, and it’s been a great general purpose working machine.

So I backed up all my files, downloaded the disc image and re-formatted my computer. Here are my impressions and notes:

  • The installer worked really well. No surprises. It was straightforward, fast, and it worked on the first time. I just restarted my computer with the USB installer drive in it, and it worked. On the first try.
  • For reference, to install the previous version, Freya, I had to use another computer to download wifi drivers so that I could get the computer on the internet. It took a few tries. Not this time!
  • Also, the previous version had a really hard time with rebooting. I had to fiddle around in GRUB settings to make it work. This time it worked right out of the box.
  • Loki is designed to be simple. This means that certain features, like adding software repositories, is disabled. Hence, right away I installed: software-properties-common, gdebi and elementary-tweaks, of course.
    • gdebi was necessary to install, R Studio, Vocal, etc.
    • Single-click to open files by default in the file manager? Really?
  • I needed to follow the instructions on the following page to get R running:
  • I needed to install Dropbox from here:
  • This version of Loki is based on Ubuntu 16.04 rather than 14.04, which fixed a lot of the problems that I forgot I had. For example, this allowed me to upgrade to the newest version of LibreOffice. LibreOffice was working mostly fine before, except for a weird graphical glitch with the Zotero plugin (the buttons’ colours were inverted).
  • Now that I’m on the topic, installing LibreOffice from the AppCenter didn’t work. I had to un-install and then install from the command line. That was weird.
  • The system tray is much better now. There were a few apps that just never appeared up there despite my best efforts to fix them.
  • I don’t actually care for the new “AppCenter” that comes with Loki very much. Some of the software that’s in there doesn’t “just work.” (LibreOffice for example didn’t work until I installed it using apt-get in the Terminal.)
  • Adobe Digital Editions still works perfectly through Wine, so I can still get all my library books on my Kobo that way.
  • I installed Deja Dup from the command line, and then followed these instructions to hook it up to the file manager:
  • The Music app is very iTunes-like. And by that I mean that it is what iTunes used to be like before it became bloated and unusable. It’s a simple, single-purpose app that opens from a cold start in less than a second on my computer. There’s no store. It doesn’t sync my phone. All it does is index the music files in my ~/Music folder, and allows me to play them.
  • Actually, come to think of it, that’s what I like about elementary OS as a whole. It’s Mac-like, but without a lot of the really annoying bloaty stuff that makes it hard to actually get work done on a Mac.
    • How many times do I have to tell my computer, “No, I don’t have or want an Apple ID?”
    • “No, I don’t want this update to GarageBand.” Stop asking me.


This is the best ever!

We're calling it "Two Queens Honey"
We assembled our apiary with … Pride (haha it’s a gay joke)

I am the most hipster now

I am unreasonably excited about this and have been for months.

As of last Sunday, I am the most hipster ever. This has been a project that’s a long time coming. I’m sure that my colleagues are tired of hearing me talking about it: A few weeks back we bought an apiary, assembled it, painted the top like a Pride flag, and then on the 29th, we went to pick up our bees!

On the night that they arrived, it rained fairly intensely, and hearing this, Alain and I ran to the back porch to check on them, like worried parents.

They were okay.

This brings me one step closer to my dream of being the undisputed king of the hipsters by walking up and down the Plateau wearing an ironic beard of bees.

Ten Thousand Babies
Ten Thousand Babies

We now have ten thousand babies

This is a lot of bees. They came on 5 wooden “frames” with a sort of waxy backing that’s reinforced with wire, on which the bees make their honeycomb and fill it with wax or pollen or eggs. We do have a friend who knows what he’s doing, and with his help, we transferred the 5 frames into a much bigger 8-frame box.

To do so, we had to pick up each frame by hand and slowly transfer it over. You’d think that the bees would start getting stabby at that point, but they really didn’t care. At all. Honeybees are super-chill. I had probably a couple dozen bees going over my bare hands while moving them over to their new home, and none of us got stung. Alain moved the frame that had the queen on it.


Our neighbour was in the back yard that morning. She almost certainly saw the giant swarm of bees while we were moving them from one box to another. No comment yet. There, of course, is no longer a giant swarm. The picture attached above makes it seem like there’s thousands of bees flying around everywhere, but they were just excited in that picture because they were still moving over from their old box to the new one.

Unless you’re actively looking for bees in our back yard, you’d never know that there’s an apiary with 10-80 k bees in it right there. There’s always a few at the entrance, but you can stand even a metre away and bee totally unaware that our little ladies are there, doing their thing.

We're so cute!
Two Queens Honey!

Among the problems I never thought I’d have, “too much honey” used to bee one of them

By the way, get used to the “bee” puns. They’re going to bee awful. I will also point them all out and explain the jokes as we go along. Puns are the highest form of humour after all, and jokes are better when you explain them.

We might have as much as 23 kg of honey from this season, and so we’ve decided to sell it along with the wax. We’re calling our product “Two Queens Honey.” (That one is a gay joke, and also a bee pun!) The flavour of honey is determined by the plants that the bees feed on, so I wanted to take up our lawn and put in nothing but hot peppers, just to see if we could get honey with a bit of a … sting.

My idea was vetoed. Alas.

Honestly, I’m amazed that this idea wasn’t shot down earlier. To quote Alain,”I never would have done this if we weren’t together,” or as my ex-wife put it, “This is another reason why you and Alain are a much better match.”

We are planning on taking a beer-brewing course later in the summer, so there may be Two Queens Honeybeers coming. More on this story as it develops.

Break-in data from the SPVM

Montreal Break-ins Week-by-Week
Montreal Break-ins Week-by-Week

Just today, the SPVM released some crime statistics for the island of Montreal.

CBC already did an interactive map, so I took the data set and made a histogram of break-ins by date!

It’s a PDF and you can download it and look at it RIGHT NOW. There’s also a version where it’s lumped by month, which is also instructive.

And then, I made a week-by-week animated GIF of the locations of the break-ins! (Click the image attached to this post to see.)

It’s Valentine’s Day! Time to review Bayes Theorem.

Figure 1
Figure 1

It’s Valentine’s Day! Time to review your knowledge of Bayes Theorem. Here’s a fun exercise to do: Calculate the probability that a gay man is HIV-negative, given that he tells you he’s HIV-negative.


First, let’s define our terms.

h: Does not have HIV
~h: Does have HIV
e: Says he does not have HIV
~e: Says he does have HIV


So let’s imagine that you’re a gay man, and you’re going to hook up with a guy for Valentine’s Day. You might be interested in calculating the following: P(h|e)

This expression, P(h|e) represents the probability that a gay man does not have HIV given that he says he does not have HIV.


The base rate of HIV infection among gay men who have sex with men is 19%.1

Hence: P(~h) = 0.19; or P(h) = 0.81

See Figure 1 for a graphical representation. The entire square represents all gay men who have sex with men. The blue rectangle takes up 81% of the square, which is proportional to the CDC’s best estimate for the number of gay men who are actually HIV-negative.

From the same source, we can also determine that the probability that a person says he does not have HIV given that he does have HIV to be 44%.1

Hence: P(e|~h) = 0.44

In Figure 1, this is represented by the green rectangle. Given that a person is HIV-positive, there’s a 44% that they don’t know, and so they would likely say that they are “negative.”

The remainder, the yellow rectangle, is the proportion of gay men who are HIV-positive and who know that they are HIV-positive.


I am considering only the population of gay men who have sex with men.

Built into this is the assumption that men who have HIV and don’t know it would report themselves as HIV-negative, or that there wouldn’t be anyone who just says “I don’t know.”

I am also assuming here that 100% of gay men who don’t have HIV will say that they don’t have HIV. Put another way, there is a 0% chance that someone will say he has HIV if, in fact he does not have HIV. This is a simplification, It’s possible that someone is confused about his status, but very unlikely. Hence:

P(e|h) = 1; or P(~e|h) = 0

Bayes Theorem

To calculate our desired value, P(h|e), we should use Bayes Theorem.

P(h|e) = P(h) / ( P(h) + P(e|~h) * P(~h) / P(e|h) )

P(h|e) = 0.81 / ( 0.81 + 0.44 * 0.19 / 1 )

P(h|e) = 0.81 / ( 0.81 + 0.44 * 0.19 )

P(h|e) = 0.91

To illustrate this graphically, in Figure 1, this would represent the chance of your prospective hook-up being in the blue area, given that the only thing you know about him is that he’s either in the blue area or the green area.


Your risk of HIV exposure can be informed by your prospective sexual partner’s response to whether or not he is HIV-negative.

If a person tells you that he’s HIV-positive, he knows his status. No one goes around claiming to be HIV-positive unless they’ve been tested and got a positive result. The best evidence we have indicates that HIV-positive people with an undetectable viral load do not transmit HIV.2 So with a sexual partner who’s HIV-positive, you’re not getting any surprises.

If you don’t even ask about your prospective sex partner’s HIV status, you can be 81% certain that he’s HIV-negative, just because of the base rate of HIV prevalence. If you do ask and he tells you that he’s negative, that is a useful piece of information—it allows you to update your estimation of the probability that your prospective sexual partner is HIV-negative to 91%, but there’s still about a 1 in 10 chance that he’s HIV-positive, has no idea, and is not being treated for it.

Happy Valentine’s Day everyone!


  2. Attia S et al. Sexual transmission of HIV according to viral load and antiretroviral therapy: systematic review and meta-analysis. AIDS. 23(11): 1397–1404, 2009.

It’s Movember! Review your knowledge of Bayes’ theorem before getting your PSA test.

Background info

There are 3 million in the U.S. currently living with prostate cancer. There are approximately 320 million people in the US today, roughly half of whom will have prostates. Hence, let us take the prevalence of prostate cancer among those who have prostates to be approximately 3 in 160, or just under 2%.

The false positive (type I error) rate is reported at 33% for PSA velocity screening, or as high as 75%. The false negative (type II error) rate is reported as between 10-20%. For the purpose of this analysis, let’s give the PSA test the benefit of the doubt, and attribute to it the lowest type I and type II error rates, namely 33% and 10%.

Skill testing question

If some random person with a prostate from the United States, where the prevalence of prostate cancer is 2%, receives a positive PSA test result, where that test has a false positive rate of 33% and a false negative rate of 10%, what is the chance that this person actually has prostate cancer?

Bayes’ theorem

Recall Bayes’ theorem from your undergraduate Philosophy of Science class. Let us define the hypothesis we’re interested in testing and the evidence we are considering as follows:

P(h): The prior probability that this person has cancer
P(e|¬h): The false positive (type I error) rate
P(¬e|h): The false negative (type II error) rate

P(h) = 3/160
P(e|¬h) = 0.33
P(¬e|h) = 0.10

Given these definitions, the quantity we are interested in calculating is P(h|e), the probability that the person has prostate cancer, given that he returns a positive PSA test result. We can calculate this value using the following formulation of Bayes’ theorem:

P(h|e) = P(h) / [ P(h) + ( P(e|¬h) P(¬h) ) / ( P(e|h) ) ]

From the above probabilities and the laws of probability, we can derive the following missing quantities.

P(¬h) = 1 – 3/160
P(e|h) = 0.90

These can be inserted into the formula above. The answer to the skill-testing question is that there is a 4.95% chance that the randomly selected person in question will have prostate cancer, given a positive PSA test result.

What if we know more about the person in question?

Let’s imagine that the person is not selected at random. Say that this person is a man with a prostate and he is over 60 years old.

According to Zlotta et al, the prevalence of prostate cancer rises to over 40% in men over age 60. If we redo the above calculation with this base rate, P(h) = 0.40, we find that P(h|e) rises to 64.5%.

Take-home messages

  1. Humans are very bad at intuiting probabilities. See Wikipedia for recommended reading on the Base Rate Fallacy.
  2. Having a prostate is neither a necessary nor a sufficient condition for being a man. Just FYI.
  3. Don’t get tested for prostate cancer unless you’re in a higher-risk group, because the base rate of prostate cancer is so low in the general population that if you get a positive result, it’s likely to be a false positive.

The answer to the question

On October 9, inspired by the STREAM research group’s Forecasting Project, I posed a question to the Internet: “Do you know how the election is going to turn out?” I tweeted it at news anchors, MP’s, celebrities, academics, friends and family alike.

I’m very happy with the response! I got 87 predictions, and only 11 of them were what I would consider “spam.” I took those responses and analysed them to see if there were any variables that predicted better success in forecasting the result of the election.

The take-home message is: No. Nobody saw it coming. The polls had the general proportion of the vote pretty much correct, but since polls do not reflect the distribution of voters in individual ridings, the final seat count was very surprising. This may even suggest that the Liberals got the impetus for a majority result from the fact that everyone expected they would only narrowly eke out a victory over the incumbent Tories.

You can view the final report in web format or download it as a PDF.

Aristotle’s Square of Opposition … in Lojban

Here it is: one obscure type of logic expressed in an even more obscure type of logic! You’re welcome!

xusra natfe
kampu ro da poi broda cu brode no da poi broda cu brode
steci da poi broda cu brode da poi broda cu na brode

I wasn’t sure whether to render the O categorical sentence (bottom-right) as {naku ro da poi broda cu brode} or {da poi broda cu na brode}. It has been left as an exercise for the reader to determine whether they are logically equivalent.

Can you predict the outcome of Canada’s 42nd federal election?

The STREAM (Studies of Translation, Ethics and Medicine) research group at McGill University, of which I’m a part, has been working on a project for the last year or so in which we elicit forecasts of clinical trial results from experts in their field. We want to see how well-calibrated clinical trialists are, and to see which members of a team are better or worse at predicting trial outcomes like patient accrual, safety events and efficacy measures.

Inspired by this, I borrowed some of the code we have been using to get forecasts from clinical trial investigators, and have applied it to the case of Canada’s 42nd federal election, and now I’m asking for you to do your best to predict how many seats each party will get, and who will win in your riding.

Let’s see how well we, as a group, can predict the outcome, and see if there are regional or demographic predictors for who is better or worse at predicting election results. The more people who make predictions, the better the data set I’ll have at the end, so please submit a forecast, and ask your friends!

The link for the forecasting tool is here:

Just to make it interesting: I will personally buy a beer for the forecaster who gives me the best prediction out of them all.* :)

* If you are younger than 18 years of age, you get a fancy coffee, not a beer. No purchase necessary, only one forecast per person. Forecaster must provide email with the prediction in order for me to contact him/her. In the case of a tie, one lucky beer-receiver will be chosen randomly. Having the beer together with me is conditional on the convenience of both parties (e.g. if you live in Vancouver or something, I’ll just figure out a way to buy you a beer remotely, since I’m in Montreal). You may consult any materials, sources, polls or whatever. This is a test of your prediction ability, not memory, after all. Prediction must be submitted by midnight on October 18, 2015.

Lojban logical connectives illustrated with Venn diagrams

Truth functions for Lojban logical connectives
Truth functions for Lojban logical connectives

For my own reference, I have illustrated the 14 possible truth functions that can be expressed using the A, E, O and U connectives in Lojban, and annotated them with the most appropriate forethought connective to do the job.

In many cases, there are multiple ways to express a single truth function. For example, {gonai broda gi brode} is logically equivalent to {segonai broda gi brode} and {go broda ginai brode} and {sego broda ginai brode}, but despite the fact that they are well-formed Lojban there is literally no reason to ever use those constructions, and ones like {go … ginai …} kind of defeats the purpose of using a forethought connective in the first place. Mutatis mutandis with {ga … gi …} vs {sega … gi …}, etc.

Of course, since there’s 4 regions of the Venn diagram that could be shaded or not, that makes 24=16 possible truth functions. The topmost Venn diagram is the forethought-connected question, not an attempt at the truth function where all the regions are unshaded. So what happened to the remaining two functions? It is not possible, using the regular Lojban connectives, to make a Venn diagram that would be all white or all red. Fortunately, as the CLL says, these are “pretty useless anyway.”

Come to think of it, how would I render those into English? “A or B or both or not A or B?” and “Not A and Not B and not not A or B?” Gross.

For a more legible version, see the attached PDF: Truth functions