It’s Valentine’s Day! Time to review Bayes Theorem.

Figure 1
Figure 1

It’s Valentine’s Day! Time to review your knowledge of Bayes Theorem. Here’s a fun exercise to do: Calculate the probability that a gay man is HIV-negative, given that he tells you he’s HIV-negative.


First, let’s define our terms.

h: Does not have HIV
~h: Does have HIV
e: Says he does not have HIV
~e: Says he does have HIV


So let’s imagine that you’re a gay man, and you’re going to hook up with a guy for Valentine’s Day. You might be interested in calculating the following: P(h|e)

This expression, P(h|e) represents the probability that a gay man does not have HIV given that he says he does not have HIV.


The base rate of HIV infection among gay men who have sex with men is 19%.1

Hence: P(~h) = 0.19; or P(h) = 0.81

See Figure 1 for a graphical representation. The entire square represents all gay men who have sex with men. The blue rectangle takes up 81% of the square, which is proportional to the CDC’s best estimate for the number of gay men who are actually HIV-negative.

From the same source, we can also determine that the probability that a person says he does not have HIV given that he does have HIV to be 44%.1

Hence: P(e|~h) = 0.44

In Figure 1, this is represented by the green rectangle. Given that a person is HIV-positive, there’s a 44% that they don’t know, and so they would likely say that they are “negative.”

The remainder, the yellow rectangle, is the proportion of gay men who are HIV-positive and who know that they are HIV-positive.


I am considering only the population of gay men who have sex with men.

Built into this is the assumption that men who have HIV and don’t know it would report themselves as HIV-negative, or that there wouldn’t be anyone who just says “I don’t know.”

I am also assuming here that 100% of gay men who don’t have HIV will say that they don’t have HIV. Put another way, there is a 0% chance that someone will say he has HIV if, in fact he does not have HIV. This is a simplification, It’s possible that someone is confused about his status, but very unlikely. Hence:

P(e|h) = 1; or P(~e|h) = 0

Bayes Theorem

To calculate our desired value, P(h|e), we should use Bayes Theorem.

P(h|e) = P(h) / ( P(h) + P(e|~h) * P(~h) / P(e|h) )

P(h|e) = 0.81 / ( 0.81 + 0.44 * 0.19 / 1 )

P(h|e) = 0.81 / ( 0.81 + 0.44 * 0.19 )

P(h|e) = 0.91

To illustrate this graphically, in Figure 1, this would represent the chance of your prospective hook-up being in the blue area, given that the only thing you know about him is that he’s either in the blue area or the green area.


Your risk of HIV exposure can be informed by your prospective sexual partner’s response to whether or not he is HIV-negative.

If a person tells you that he’s HIV-positive, he knows his status. No one goes around claiming to be HIV-positive unless they’ve been tested and got a positive result. The best evidence we have indicates that HIV-positive people with an undetectable viral load do not transmit HIV.2 So with a sexual partner who’s HIV-positive, you’re not getting any surprises.

If you don’t even ask about your prospective sex partner’s HIV status, you can be 81% certain that he’s HIV-negative, just because of the base rate of HIV prevalence. If you do ask and he tells you that he’s negative, that is a useful piece of information—it allows you to update your estimation of the probability that your prospective sexual partner is HIV-negative to 91%, but there’s still about a 1 in 10 chance that he’s HIV-positive, has no idea, and is not being treated for it.

Happy Valentine’s Day everyone!


  2. Attia S et al. Sexual transmission of HIV according to viral load and antiretroviral therapy: systematic review and meta-analysis. AIDS. 23(11): 1397–1404, 2009.

It’s Movember! Review your knowledge of Bayes’ theorem before getting your PSA test.

Background info

There are 3 million in the U.S. currently living with prostate cancer. There are approximately 320 million people in the US today, roughly half of whom will have prostates. Hence, let us take the prevalence of prostate cancer among those who have prostates to be approximately 3 in 160, or just under 2%.

The false positive (type I error) rate is reported at 33% for PSA velocity screening, or as high as 75%. The false negative (type II error) rate is reported as between 10-20%. For the purpose of this analysis, let’s give the PSA test the benefit of the doubt, and attribute to it the lowest type I and type II error rates, namely 33% and 10%.

Skill testing question

If some random person with a prostate from the United States, where the prevalence of prostate cancer is 2%, receives a positive PSA test result, where that test has a false positive rate of 33% and a false negative rate of 10%, what is the chance that this person actually has prostate cancer?

Bayes’ theorem

Recall Bayes’ theorem from your undergraduate Philosophy of Science class. Let us define the hypothesis we’re interested in testing and the evidence we are considering as follows:

P(h): The prior probability that this person has cancer
P(e|¬h): The false positive (type I error) rate
P(¬e|h): The false negative (type II error) rate

P(h) = 3/160
P(e|¬h) = 0.33
P(¬e|h) = 0.10

Given these definitions, the quantity we are interested in calculating is P(h|e), the probability that the person has prostate cancer, given that he returns a positive PSA test result. We can calculate this value using the following formulation of Bayes’ theorem:

P(h|e) = P(h) / [ P(h) + ( P(e|¬h) P(¬h) ) / ( P(e|h) ) ]

From the above probabilities and the laws of probability, we can derive the following missing quantities.

P(¬h) = 1 – 3/160
P(e|h) = 0.90

These can be inserted into the formula above. The answer to the skill-testing question is that there is a 4.95% chance that the randomly selected person in question will have prostate cancer, given a positive PSA test result.

What if we know more about the person in question?

Let’s imagine that the person is not selected at random. Say that this person is a man with a prostate and he is over 60 years old.

According to Zlotta et al, the prevalence of prostate cancer rises to over 40% in men over age 60. If we redo the above calculation with this base rate, P(h) = 0.40, we find that P(h|e) rises to 64.5%.

Take-home messages

  1. Humans are very bad at intuiting probabilities. See Wikipedia for recommended reading on the Base Rate Fallacy.
  2. Having a prostate is neither a necessary nor a sufficient condition for being a man. Just FYI.
  3. Don’t get tested for prostate cancer unless you’re in a higher-risk group, because the base rate of prostate cancer is so low in the general population that if you get a positive result, it’s likely to be a false positive.

Aristotle’s Square of Opposition … in Lojban

Here it is: one obscure type of logic expressed in an even more obscure type of logic! You’re welcome!

xusra natfe
kampu ro da poi broda cu brode no da poi broda cu brode
steci da poi broda cu brode da poi broda cu na brode

I wasn’t sure whether to render the O categorical sentence (bottom-right) as {naku ro da poi broda cu brode} or {da poi broda cu na brode}. It has been left as an exercise for the reader to determine whether they are logically equivalent.

Can you predict the outcome of Canada’s 42nd federal election?

The STREAM (Studies of Translation, Ethics and Medicine) research group at McGill University, of which I’m a part, has been working on a project for the last year or so in which we elicit forecasts of clinical trial results from experts in their field. We want to see how well-calibrated clinical trialists are, and to see which members of a team are better or worse at predicting trial outcomes like patient accrual, safety events and efficacy measures.

Inspired by this, I borrowed some of the code we have been using to get forecasts from clinical trial investigators, and have applied it to the case of Canada’s 42nd federal election, and now I’m asking for you to do your best to predict how many seats each party will get, and who will win in your riding.

Let’s see how well we, as a group, can predict the outcome, and see if there are regional or demographic predictors for who is better or worse at predicting election results. The more people who make predictions, the better the data set I’ll have at the end, so please submit a forecast, and ask your friends!

The link for the forecasting tool is here:

Just to make it interesting: I will personally buy a beer for the forecaster who gives me the best prediction out of them all.* :)

* If you are younger than 18 years of age, you get a fancy coffee, not a beer. No purchase necessary, only one forecast per person. Forecaster must provide email with the prediction in order for me to contact him/her. In the case of a tie, one lucky beer-receiver will be chosen randomly. Having the beer together with me is conditional on the convenience of both parties (e.g. if you live in Vancouver or something, I’ll just figure out a way to buy you a beer remotely, since I’m in Montreal). You may consult any materials, sources, polls or whatever. This is a test of your prediction ability, not memory, after all. Prediction must be submitted by midnight on October 18, 2015.

Yes, it’s racist

Judge Eliana Marengo recently told another human being that she had to be stripped of her identity and publicly humiliated in order to have her case heard in a court in Québec. That is to say, the judge refused to hear the case while she was wearing a hijab.

For clarity, Article 13 of the regulations of the Court of Quebec make no reference to headscarves. This was just one judge’s decision to make life harder for another human being. And it was racist.

Wait, how was it racist?

This is a point that people keep refusing to understand. I have written previously about how you can be substantially racist, sexist, homophobic, transphobic, etc. without ever actually making reference to a person’s race, sex, orientation, gender, etc. This is exactly the same thing.

A policy that makes life harder for one group of people is discriminatory against that group, regardless of how obliquely that group is singled out in the wording of the policy itself. And it’s still discriminatory even if that policy contains an ostensibly non-racist/non-sexist/etc. counter-example to ward off suspicions of racism, sexism, etc. (Cf. the Charter of Values and conspicuously large crucifixes).

It is laughable that Marengo invoked equality to justify her racist abuse of power. She deigned to instruct us in righteousness by telling us, “The same rules need to be applied to everyone.” To get an idea of how the rules are applied to everyone in Québec, I have compiled Table 1, below.

White people do religious stuff in the public sphere in Québec all the time. Nobody minds. Nobody gets upset. Certainly nobody refuses to give them the basic justice that all humans are due. But when one private person of colour wears a hijab to court, suddenly a) it’s fair game to publicly humiliate them and strip their identity, and b) it’s hitting below the belt to call it “racist” when it happens.

Table 1: A convenience sample of conspicuous religious accommodations in the province of Québec, indexed by race

Religious thing Private or public? Who did it? (Race) Is it okay in Québec?
Prominent crucifix in legislature Public White Okay!
Giant cross overlooking biggest city in province Public White Okay!
Big white cross dominating the provincial flag Public White Okay!
Nearly every street and city named after a Christian saint Public White Okay!
Private person wearing hijab in court Private POC “This is unacceptable! Religious people are always demanding more and more accommodations. This is not about race at all!”

An unexpected link between computer science and the ethics of consent in the acutely comatose

Yesterday, Dr Weijer from Western U came to the STREAM research group at McGill to give a talk on the ethics of fMRI studies on acutely comatose patients in the intensive care unit. One of the topics he briefly covered (not the main topic of his talk) was that of patients who may be “awake,” but generally unaware of their surroundings, while in an acutely comatose state of some kind. Using an fMRI, questions can be asked of some of these subjects, by telling them to imagine playing tennis for “yes,” and to imagine navigating their home for “no.” Since the areas of the brain for these two tasks are very different, these can be used to distinguish responses with some accuracy. In some rare cases, patients in this condition are able to consistently answer biographical questions, indicating that they are in some sense, conscious.

One of the questions that arises is: Could we use this method to involve a comatose patient in decision-making regarding her own care, in cases where we were able to establish this sort of communication?

Informed consent in medical ethics is usually conceived in terms of: disclosure, capacity and voluntariness, and the most obvious question to arise in the types of cases we’re considering is whether or not you could ever know with certainty that a comatose person has the capacity to make such decisions in such a state. (Indeed, a comatose patient is often the example given of someone who does not have the capacity to consent.) Dr Weijer was generally sceptical on that front.

Partway through his discussion, I had the impression that the problem was strangely familiar. If we abstract away some of the details of the situation in question, we are left with an experimenter who is sending natural language queries into a black box system, which replies with a digital (0/1) output, and then the experimenter has to make the best evaluation she can as to whether the black box contains a person, or if it is just an “automatic” response of some kind.

For those of you with some background in computer science, you will recognise this as the Turing Test. Over the 65 years since it was first suggested, for one reason or another, most people have abandoned the Turing Test as a way to address the question of artificial intelligence, although it still holds a certain popular sway, as claims of chatbots that can beat the Turing Test still make the news. While many would reject that it is even an important question whether a chatbot can make you believe it is a person, at least in the fMRI/coma patient version, no one can dispute whether there is something important at stake.

Four causes in Aristotle and in Lojban

Cause of parents' death? Got in my way.
Cause of parents’ death? Got in my way.

Reading through Lojban for Beginners on a Friday night (like everyone else who knows how to party—am I right?), I got to the chapter where causation and implication are discussed. In it, the authors explain that Lojban has 4 ways to say “because.”

The inventors of Lojban were not the first to preach a Doctrine of Four Causes. Aristotle also believed that when you asked Why?, you could be reasonably taken to be asking one of four different questions, which I have summed up below in Table 1. In Table 2, I summarise the four ways to say “because” in Lojban.

I made an attempt to see how well these things would match up by giving examples and rough equivalents. Unfortunately for Aristotle, they only seem to correspond decently well in two cases. But then I guess if I were trying to write a timeless philosophy, and someone told me that after ~ 2300 years, people would still be interested in half of the questions that I identified, I’d probably think that I did pretty well. Especially considering that Aristotle spoke Ancient Greek, which is pretty much the anti-Lojban if any language is.

For the other two of Aristotle’s causes, there are words in Lojban for the ideas expressed, but they aren’t really “causation” type words in the same way.

Table 1. Aristotle’s four causes

Cause Example Rough Lojban equivalent
Material cause The house is here because there were bricks, mortar and wood here previously. te zbasu (?)
Efficient cause The house is here because Bob built it. rinka
Formal cause The house is here because the materials have been arranged in a certain way. tarmi (?)
Final cause The house is here because Bob wanted to have dance-parties inside it. mukti

Table 2. Causes in Lojban

Cause “Because” word Root gismu Example Rough Aristotelean equivalent
Physically caused ri’a rinka la bab. morsi ri’a lonu mi darxi by
Bob is dead because I hit him.
Efficient cause
Motivated mu’i mukti la bab. morsi mu’i lonu mi na’e nelci by
Bob is dead because I didn’t like him.
Final cause
Justification ki’u krinu la bab. morsi ki’u lonu by djuno lo dukse
Bob is dead because he knew too much.
Logically entailed ni’i krinu la bab. morsi ni’i lonu ro lo remna mrodimna kei .e lonu by remna
Bob is dead because all humans are mortal and Bob is a human.

In defense of #selfies

Someone felt good enough about her appearance that she took a picture. Let us all ridicule her for that.
Someone felt good enough about their appearance that she took a picture. Let us all ridicule them for that. HA HA.

It is fashionable these days to tease people who take selfies, or to look down one’s nose at those who do take selfies, or to dismiss them as juvenile, feminine, vain, or generally bad for reasons unspecified.

You’ve seen it before. Maybe you’ve done it yourself. You see someone pull out a phone to take a selfie, and you make a joke about it, or someone complains about how “everyone is always taking selfies.”

There’s a sort of a snobbish “I’m better than that” attitude that comes along with all these condemnations. The commentator looks around after the comment was made, grinning in a most self-satisfied way, as if he has said something most original and daring. There’s a smug, superior, aren’t-I-clever-for-going-against-the-grain vibe that I get from people who say things like that, and I just can’t deal with it anymore.

First off, when you condemn selfies and those who take them, you are not saying anything clever or original. It’s not funny. It’s not illuminating. You haven’t picked out some interesting and unremarked-upon feature of human experience that no-one else has noticed. (Not that I’m claiming that any of the following ideas are original to myself either—plenty of other people have had reasoned pro-selfie positions. Consider this more of a rant than a claim to an original philosophy.)

Further, you are not some brave individualistic rebel among a flock of narcissistic sheeple. If anything, this makes you more like a corporate shill, helping to ensure that a new generation of young people is intimidated into believing that they have good reason to be insecure (and thus prepared to spend money to make that feeling go away). There are, after all, entire industries whose business model depends on encouraging our insecurities and preying on them. So if you’re feeling smug about being the lone wolf who’s bucking a terrifying trend of vanity, you should consider that every single person you’re criticising has been told “you’re not good enough and you should feel bad about it” in a million subtle (and also a million not-so-subtle-and-corporately-funded) ways for their entire life.

When you say things like, “No one wants to see your selfies,” you are not actually commenting on the value of the photographs that you’re disparaging, even if you think that’s what you’re doing. You’re coming closer to making a commentary on your own value as a friend, though. With a statement like that, you’re saying, “I don’t care about you, how you look, or what you’re doing. I don’t care that you felt good about yourself today.” And when you say things like that, you’re telling everyone in earshot that they shouldn’t expect positive feedback or encouragement from you.

It’s the same sort of attitude that you get from people who say things like, “Don’t tweet about what you had for breakfast,” or “You don’t need to make a Facebook post every time you go for a run.” You know what? If you care that little, no one’s forcing you to use social media. You can leave the party if you’re not enjoying it.

And this is why the whole thing is hypocrisy: When you say, “How egotistical—my friend posted a selfie,” what you are really saying is “I don’t care about my friend—if they’re feeling good about their appearance, or what they’re doing, or if they just want some positive attention from their friends, then that is unimportant or offensive to me somehow.” And that attitude—trying to make someone feel bad, just so you can have the satisfaction of looking down your nose at them—is so much more self-absorbed than posting a selfie.

As for me, I do care about my friends, and when I see a friend’s selfie go by on my Twitter feed, I want my first thought to be “Aww, isn’t that cute,” and not “How can I make that person feel bad?” That’s the kind of person I want to be.

So I started learning Lojban .ui

This Friday past, I started learning Lojban. For the non-initiate, Lojban is a constructed language based on predicate logic that is syntactically unambiguous. I’d known about it for years, probably hearing about it first on CBC, maybe 10 years ago. It’s the sort of thing that shows up in Dinosaur Comics or in XKCD periodically. Up until this weekend, the existence of Lojban had mostly been one of those “cocktail party facts,” but then I finally took the plunge. After 1 weekend of working on it, I’m about 35% of the way through Lojban for Beginners, having downloaded it to my Kobo for reference during the car ride to Stratford.

It’s often billed as being an ideal language for fields like law, science or philosophy, due to its unambiguous and culturally neutral nature. So I set out to find out certain specialised terms from my field, bioethics, and it turns out that they mostly don’t exist yet. This, of course, offers some exciting opportunities for a grad student. :)

I’ve convinced a few people in Montréal to learn Lojban with me, and even found a Montrealer who speaks Lojban on a IRC channel. (Yes, IRC still exists!) We may “ckafi pinxe kansa,” as they say in Lojban, apparently.

If you too want to get in on the ground floor of Lojban Montréal, let me know!

Proof of prespecified endpoints in medical research with the bitcoin blockchain

NOTICE (2022-05-24)

This blog post was written in 2014, when I still naively hoped that the myriad problems with cryptocurrency might still be solved. I am now somewhat embarrassed to have written this in the first place, but will leave the post up for historical reasons. (Quite a number of medical journal articles link here now, for better or for worse.)

While the following methods are valid as far as they go, I absolutely DO NOT recommend actually using them to timestamp research protocols. In fact, I recommend that you never use a blockchain for anything, ever.


The gerrymandering of endpoints or analytic strategies in medical research is a serious ethical issue. “Fishing expeditions” for statistically significant relationships among trial data or meta-analytic samples can confound proper inference by statistical multiplicity. This may undermine the validity of research findings, and even threaten a favourable balance of patient risk and benefit in certain clinical trials. “Changing the goalposts” for a clinical trial or a meta-analysis when a desired endpoint is not reached is another troubling example of a potential scientific fraud that is possible when endpoints are not specified in advance.

Pre-specifying endpoints

Choosing endpoints to be measured and analyses to be performed in advance of conducting a study is a hallmark of good research practice. However, if a protocol is published on an author’s own web site, it is trivial for an author to retroactively alter her own “pre-specified” goals to align with the objectives pursued in the final publication. Even a researcher who is acting in good faith may find it less than compelling to tell her readers that endpoints were pre-specified, with only her word as a guarantee.

Advising a researcher to publish her protocol in an independent venue such as a journal or a clinical trial registry in advance of conducting research does not solve this problem, and even creates some new ones. Publishing a methods paper is a lengthy and costly process with no guarantee of success—it may not be possible to find a journal interested in publishing your protocol.

Pre-specifying endpoints in a clinical trial registry may be feasible for clinical trials, but these registries are not open to meta-analytic projects. Further, clinical trial registry entries may be changed, and it is much more difficult (although still possible) to download previous versions of trial registries than it is to retrieve the current one. For example, there is still no way to automate downloading of XML-formatted historical trial data from in the same way that the current version of trial data can be automatically downloaded and processed. Burying clinical trial data in the “history” of a registry is not a difficult task.

Publishing analyses to be performed prior to executing the research itself potentially sets up a researcher to have her project “scooped” by a faster or better-funded rival research group who finds her question interesting.

Using the bitcoin blockchain to prove a document’s existence at a certain time

Bitcoin uses a distributed, permanent, timestamped, public ledger of all transactions (called a “blockchain”) to establish which addresses have been credited with how many bitcoins. The blockchain indirectly provides a method for establishing the existence of a document at particular time that can be independently verified by any interested party, without relying on a medical researcher’s moral character or the authority (or longevity) of a central registry. Even in the case that the NIH’s servers were destroyed by a natural disaster, if there were any full bitcoin nodes left running in the world, the method described below could be used to confirm that a paper’s analytic method was established at the time the authors claim.


  1. Prepare a document containing the protocol, including explicitly pre-specified endpoints and all prospectively planned analyses. I recommend using a non-proprietary document format (e.g. an unformatted text file or a LaTeX source file).
  2. Calculate the document’s SHA256 digest and convert it to a bitcoin private key.
  3. Import this private key into a bitcoin wallet, and send an arbitrary amount of bitcoin to its corresponding public address. After the transaction is complete, I recommend emptying the bitcoin from that address to another address that only you control, as anyone given the document prepared in (1) will have the ability to generate the private key and spend the funds you just sent to it.


The incorporation into the blockchain of the first transaction using the address generated from the SHA256 digest of the document provides an undeniably timestamped record that the research protocol prepared in (1) is at least as old as the transaction in question. Care must be taken not to accidentally modify the protocol after this point, since only an exact copy of the original protocol will generate an identical SHA256 digest. Even the alteration of a single character will make the document fail an authentication test.

To prove a document’s existence at a certain point in time, a researcher need only provide the document in question. Any computer would be able to calculate its SHA256 digest and convert to a private key with its corresponding public address. Anyone can search for transactions on the blockchain that involve this address, and check the date when the transaction happened, proving that the document must have existed at least as early as that date.


This strategy would prevent a researcher from retroactively changing an endpoint or adding / excluding analyses after seeing the results of her study. It is simple, economical, trustless, non-proprietary, independently verifiable, and provides no opportunity for other researchers to steal the methods or goals of a project before its completion.

Unfortunately, this method would not prevent a malicious team of researchers from preparing multiple such documents in advance, in anticipation of a need to defraud the medical research establishment. To be clear, under a system as described above, retroactively changing endpoints would no longer be a question of simply deleting a paragraph in a Word document or in a trial registry. This level of dishonesty would require planning in advance (in some cases months or years), detailed anticipation of multiple contingencies, and in many cases, the cooperation of multiple members of a research team. At that point, it would probably be easier to just fake the numbers than it would be to have a folder full of blockchain-timestamped protocols with different endpoints, ready in case the endpoints need to be changed.

Further, keeping a folder of blockchain-timestamped protocols would be a very risky pursuit—all it would take is a single honest researcher in the lab to find those protocols, and she would have a permanent, undeniable and independently verifiable proof of the scientific fraud.


Fraud in scientific methods erodes confidence in the medical research establishment, which is essential to it performing its function—generating new scientific knowledge, and cases where pre-specified endpoints are retroactively changed casts doubt on the rest of medical research. A method by which anyone can verify the existence of a particular detailed protocol prior to research would lend support to the credibility of medical research, and be one less thing about which researchers have to say, “trust me.”