A plausible, scalable and slightly wrong black box: why large language models are a fascist technology that cannot be redeemed

Riker: So they could have done this review a dozen times already?

Laforge: A dozen? A hundred? It's impossible to tell

When large language models (LLMs) get something factually wrong or make something ridiculous up, everyone makes fun of it online. Gemini told everyone to put glue on their pizza! (Hilarious!) A corporate chatbot invented a company policy that doesn’t exist! (Uh oh!) There’s gotta be about a million examples of LLMs spouting out nonsense that makes them look silly. Detractors use these as a “gotcha” for why LLMs aren’t ready for real-world use, and boosters defend them by saying that LLMs will only get better.

LLM “hallucinations,” more appropriately known by the technical philosophical term, “bullshit” (text intended to persuade without regard for truth) is a well-known problem. LLMs bullshit or “hallucinate” because they do not actually produce an internal model of the problem being solved and they cannot reason toward a solution. LLMs are just statistical models that predict the next set of words to follow a prompt. They have been (fairly accurately) described as “spicy autocomplete” or “a fuzzy jpeg of the internet.” It works in the same way that your phone, for years, has been able to know that if you type “How is it,” the next word might be “going?”—just moreso.

Because of this basic underlying architecture for LLMs, they are optimized for plausibility, same as the autocorrect on your phone. They are not optimized for truth, or to connect to anything in reality. It is only being trained to produce what are the next words likely to follow from the previous ones based on texts scraped from the internet. This is why we get bullshit/hallucinations, and there’s no way to ever stop them from doing that without completely scrapping the LLM project and rebuilding an artificial intelligence chatbot on a fundamentally different foundation.

This foundation for LLMs also makes them into a “black box”—the statistical model that produces the text is so complicated, containing so many variables that there is no way to possibly explain definitively how it generated any answer to any prompt. If one wrote a regular expression to pull out all the numbers from a text, you could look at the original regular expression and find out that it missed an instance of “three” because it was only looking for numerals (i.e. “3”) and not letters. If you asked an LLM to pull all the numbers out of a text and it missed one, there is no way to ever know why, and even the “explanations” that newer generations of the models give are not real explanations, they are just more plausible but wrong text generated by the model.

LLM boosters promise, of course, that the bullshit will be fixed in the future somehow. To date, sometimes more data points being fed into the model makes the problem better, sometimes it makes it worse. There has not been a clear trend line toward subsequent iterations of LLMs improving on this front. Boosters will always tell you, “This is the worst that LLMs will ever be, the technology will only get better in the future.” What they don’t specify is “better by what metric, better at what and better for whom?” As Google Search usability over the last decade has taught us, technology sometimes gets worse because that is more profitable for those who control it.

In what follows, I will argue that being plausible but slightly wrong and un-auditable—at scale—is the killer feature of LLMs, not a bug that will ever be meaningfully addressed, and this combination of properties makes it an essentially fascist technology. By “fascist” in this context, I mean that it is well suited to centralizing authority, eliminating checks on that authority and advancing an anti-science agenda. I will use the example case of medical systematic reviews to illustrate how it will be used to advance a fascist agenda and gesture toward a few other likely areas of fascist application. I will conclude by arguing that LLMs can’t “used for good,” accepted or even regulated but must be resisted and rejected wholesale.

What LLM boosters and detractors both mostly miss is that a black box that returns a slightly wrong but very plausible answer is a much better offering than being perfectly accurate for certain use cases. This is because there’s only one way to be perfectly accurate (providing the correct answer) but there’s a million ways to be slightly off (providing an answer that misses the mark, but is still mostly defensible). To paraphrase Tolstoy, “Accurate data retrieval is all alike; every LLM response is inaccurate in its own way.” And because LLM prompts can be repeated at industrial scales, an unscrupulous user can cherry-pick the plausible-but-slightly-wrong answers they return to favour their own agenda.

It’s the scaling up of LLMs that makes its plausible black-boxed incorrectness so useful. If the LLM returns different and slightly incorrect answers depending on how one fine-tunes a prompt put to an LLM, then you can decide beforehand what answer you want from the aggregate analysis of a large corpus of data, and then have the LLM analyze it over and over until it gives you the answers you want. Because the model is a black box, no one can be expected to explain where the answer came from exactly, and because it can be applied at scale, there’s no possibility that it can be externally audited.

To illustrate this, I will use the example of a systematic review in the medical literature (my area of expertise), although there are many other areas where this strategy can be used. In the area of insurance reimbursement, for example, an insurance company could decide the exact dollar amount they want to pay out, and then reverse engineer prompts to generate responses to thousands of applications, and fine-tune their responses until the justifications produced by the LLM in the aggregate match the amount of money they wish to pay.

LLMs are the perfect technology for manipulating the medical literature to say nearly anything you want via systematic review methods

Systematic reviews are an important part of the medical evidence hierarchy, sitting even above randomized clinical trials in their level of authority. For many medical questions that have been studied, there are sometimes multiple published clinical trials or other forms of evidence that can provide slightly different, or even conflicting answers. This is not because the methods used were flawed necessarily, but because human biology is complicated, and the answers to questions like “does drug A work better than B in population Y for condition Z?” are probabilistic ones like, “It works 50% better on this metric, 75% of the time” not categorical answers like “yes” or “no.”

Systematic review methodology is meant to provide a broad overview of the medical literature on a specific subject, excluding low-quality evidence and statistically aggregating the more trustworthy evidence into an even more accurate and trustworthy estimate. They are “systematic” in the sense that they are meant to include all the evidence that has been produced to date on the question at hand. This is typically done by first performing a literature search of several medical databases to identify potential evidence sources, followed by human screening based on inclusion criteria applied to the title, abstract, followed by the full-text. This can be a work-intensive process, as selecting evidence has, prior to the advent of LLMs, required human judgement at this step.

LLMs can be deployed here to automate screening of medical journal articles for inclusion in a systematic review, drastically reducing the human work required. This is a bad thing. Because this can be automated and the results of any LLM output are always slightly inaccurate and un-auditable but scalable, it can also be easily manipulated to return an answer of the reviewer’s choosing, and this intentionally introduced bias can be difficult or impossible to discern from the end result. The fact that this process can be automated allows an unscrupulous reviewer to try an arbitrary number of LLM prompts for screening criteria, repeating the screening process until the set of articles to be included only includes the articles that the reviewer wants. This can be fine tuned to the point where the bias is subtle, even when presented with the original LLM prompts.

Similarly, LLMs can be deployed to extract data from medical journal articles, and because LLMs produce plausible answers (you could even measure and “validate” how well they perform against a “gold standard” of human data extractors) that are slightly wrong, they can be gamed to produce nearly any outcome in the aggregate in a manner that is very difficult or impossible to catch after the fact.

Couldn’t this be happening to the systematic review literature already, even without LLMs?

To be certain, an unscrupulous researcher can place their thumb on the scale at quite a number of points in the process of a systematic review, even without the use of LLMs. This happens deliberately or accidentally all the time, and as someone who has published several systematic reviews, and as someone who is often asked to do peer-review for this type of research, I am very cognizant of ways that researchers might be tempted to compromise their research integrity in order to get the answer they like.

That said, LLMs present a new challenge because of the ability that they provide to perform many different fine-tuned iterations of a SR in a manner that can’t possibly be audited externally both because the LLM is a black box and because they can be scaled to the point where double-checking is impractical, and this can be done by a single person without any scrutiny from other researchers. Without an LLM, if a researcher wanted to redo data extraction, while making fine adjustments to the inclusion criteria or the data extraction protocol, if the set of evidence being considered was large enough, it would take a team of researchers a considerable amount of time to accomplish the task even once. Being asked to repeat it over and over with minor variations to the codebook would raise suspicions and likely even some push-back from a team of humans asked to do so. The cooperation required to accomplish large data extraction tasks without an LLM implied some level of accountability. It meant that even if a researcher is willing to commit this kind of research fraud and has the resources to do so, someone else involved is likely to put on the brakes somehow.

This brings us to pinpoint why this technology isn’t just potential research fraud waiting to happen (although it is that too, and who are we kidding, it has definitely been used for research fraud already), but it’s also an essentially fascist tool: From the example of systematic review manipulation, it’s clear to see how it centralizes control over medical evidence synthesis by eliminating a large proportion of the people involved, and thus their ability to check the agenda of an unscrupulous central authority.

This technology lends itself especially well to anti-science projects like the anti-vaccine movement, who could use this technology to inaccurately synthesize evidence from the medical literature to legitimize their movement. I will not be surprised when it is used to legitimize scientific racism and anti-queer hate. While I have focused on the dangers to medical information synthesis, I can think of several other ways this technique can also be applied in other industries. An insurance company, for example, can decide what level of payouts it wishes to have, and then adjust its justifications for decisions regarding claims at scale until it reaches them, regardless of the underlying validity of the claims themselves.

Let the police or the army use this technology, and you can use your imagination on where they would go with it.

What about “responsible” LLM use?

“Using AI responsibly” certainly has the aesthetics of being a “reasonable middle ground,” away from “extreme” positions like banning, boycotting or abstaining from use. However, where fascism is concerned, being moderate toward it is not a virtue.

I’m not going to say that every person who has used an LLM for any reason is a fascist, of course. There are many ways that a reviewer can try to safeguard their own LLM use against the kind of abuses I have described above. A researcher might attempt to thoroughly test the accuracy of an LLM at a data extraction task before employing it (good luck though, the black-box nature of AI’s tends to make this a somewhat fraught enterprise). A researcher attempting to use LLMs in good faith might also pre-register their study so that they can’t alter their prompts later and cherry-pick the result. Good for them!

Unfortunately, even if you as a researcher do everything you can to use AI “responsibly,” there is no way for anyone else to distinguish your work from the irresponsible uses of AI. If you pre-registered a very detailed protocol for your systematic review before you did the work, there is no way for anyone else to know whether you already did your study before the pre-registration, except your own good word as a researcher. That’s the thing about fascist technologies—they are designed to remove accountability and centralize authority.

This vitiates the whole point of doing the study in the first place. If it all comes down to “I didn’t cheat, trust me,” and there’s literally no way for anyone else to double-check, then I don’t know what this is, but it sure isn’t science anymore.

What won’t help

1. First off, if you wish to do science in good faith, you absolutely cannot embrace LLMs for use in your own research.

“But LLMs are here to stay, we better get used to them!” says the person who’s not on OpenAI’s payroll but inexplicably wants to do their PR work for them.

Technologies are discarded, rejected or superseded all the time, even after they are touted as being “inevitable” or “here to stay so you better get used to it.” (Remember how cloning was “inevitable”? Remember how we all had to “just get used to NFTs because they’re not going anywhere?” Remember how the Metaverse was “here to stay?”)

If you do embrace LLMs, congrats, your work is now indistinguishable from all the grifters and fascists.

2. Expecting bad-faith, mass-produced and then cherry-picked systematic reviews to be debunked after they are published is a losing proposition. The naive response, that the answer to bad speech is good speech, doesn’t fly here because we’re not just answering some instances of bad speech, we’re answering a machine that produces new bad speech on an industrial scale. Not just that, but we have to take into account Brandolini’s Law, the “bullshit asymmetry principle,” that the amount of energy needed to refute bullshit is an order of magnitude grater than the energy needed to produce it. Further, as we learned from Wakefield et al (1998), even if an incorrect medical idea is completely discredited, the paper is retracted, and the author is struck off the medical register for misconduct, the damage may already be permanently done.

3. A requirement from academic journals for pre-registration of research done by LLMs would be an ineffectual half-measure that is neither adhered to by researchers, nor enforced by journals, if the trends from clinical trial pre-registration continue. It’s just so easy to “cheat” and journal editors have a tendency to bend rules like these if there’s any wiggle room for it at all, especially if the journal article has an exciting story to tell.

4. There is absolutely no way that we can expect peer review to catch this sort of fraud. I have peer-reviewed so many systematic reviews and it is like pulling teeth to get anyone to pay attention to matters of basic research integrity. Ask a journal editor to insist that the data and analysis code for a study be made available, and see how it gets accepted without those.

What will help

1. Stop using LLMs in your own research completely. It is making your work fundamentally untrustworthy for reasons I have outlined above.

2. Whenever you hear a colleague tout some brand-new study of the type I have described above, accomplished using an LLM, ask them about the kind of research fraud that’s possible and in fact very easy, as I have outlined here. Ask if they can provide any reason why anyone should believe that they didn’t do exactly that kind of fraud. If this seems too adversarial, keep in mind that this is the point of your job as an academic, and actual fraudsters, racists and anti-queer activists will and sometimes do hijack science for their own ends when no one asks the tough questions.

3. Recommend rejection for research accomplished with an LLM if you are asked to peer-review it, or if this is too work intensive, decline to review any research accomplished with an LLM for ethical reasons.

4. Under no circumstances should you include money for LLM use into your grant budgets.

5. If you are in a position of authority, such as being a journal editor, you need to use all the authority you have to draw a hard line on LLM use.

There is no moderate or responsible way to use LLMs. They need to be rejected wholesale.

I still think LLMs are cool and want to use them in my research

If you are unconvinced by the above argument, there are many other reasons why you might still want to reject LLM use entirely. I won’t go into these in detail in this article, but:

1. LLM use makes you complicit in de facto racialized torture of the Kenyan workers who prepare the texts that are used as training data.

2. From hardware manufacturing to hyperscale data centre construction to the training process for LLMs, there is a massive negative environmental impact to LLM use.

3. LLMs and other forms of generative AI depend on training data that has, in many cases, been taken without consent or compensation from artists or other workers in an act that has been described as enclosure of the digital commons.

4. LLM use deskills you as an academic.

5. You will be left holding the bag when the LLM economic bubble bursts. The costs of producing and maintaining these models is not sustainable and eventually the speculative funding will run out. When the bubble bursts, you will have built your career on methods that no longer exist, and having put into the literature results that are completely non-reproducible.

Putting the signature above the reply in mu4e

I’m assuming you already know what mu4e is and how to edit your ~/.emacs file, otherwise you wouldn’t be interested.

I put the following in my ~/.emacs, and it works for me!

(setq mu4e-compose-signature-auto-include nil)

;; Signature above reply:
(defun message-insert-signature () nil)

(defun my-mu4e-insert-signature-above-reply ()
  "Insert signature just above the quoted reply when composing"
  (save-excursion
    (goto-char (point-min))
    (when (re-search-forward "^--text follows this line--" nil t)
      (forward-line)
      (insert (concat mu4e-compose-signature "\n\n")))))

(add-hook 'mu4e-compose-mode-hook #'my-mu4e-insert-signature-above-reply)

;; this is where I tell mu4e what signature to use
(setq message-signature "\n\n--\nbgc")

And now when I reply, I get my signature above, and I don’t have to copy it from below every time!

Zippercowl

Welcome back to Sewing with Garak, where we’ll be sewing the future of fashion together (cyberpunk, queer, adversarial to straight aesthetics) on a sewing machine provided to me by the local socialist utopia (the public library).

I’ll skip to the end and show you the finished product, and then you can read on for the pattern and instructions if you’re interested.

Finished cowl neck sweater (black with electric blue contrast at sides and forearms; blue piping at the shoulders; asymmetric zipper down right side of front left open for wearing ease; sawtooth front hem; zipper pockets on sleeves)

It’s a cyberpunk inspired cowlneck shirt for cool weather, with an asymmetrical zipper for wearing ease, zippered sleeve pockets and a sawtooth front hem. I made it entirely on the socialist sewing machine at the library (BANQ)!

I made the pattern myself, which you can download as a PDF as two sheets of A0 paper:

I used black and electric blue spandex, but anything with a reasonable amount of stretch would work, probably. I made the pattern just for myself, so there’s no other sizes. You’d have to modify it yourself.

I made myself instructions to help when I’m at the sewing machine and wondering what to do next.

Lwaxana (my puppy) helped.

I’m pretty happy with the result! Here’s a close-up of the cowlneck collar and the cuffs with the zipper sleeves.

Putting the Trudeaupocalypse in some context

I’ve listened to a whole bunch of people giving their takes on why Trudeau fell from grace. Was it the carbon tax? Invoking the Emergencies Act? SNC Lavelin? Trump? And a lot of them lack basically any context. So here’s my take on the rise and fall of Trudeau and why he’s resigning now in particular.

I want my readers to know that I say this as a leftist: Nobody liked Trudeau that much to begin with or really wanted him per se to have power. Trudeau was just at the right place at the right time to become the PM in the first place, and forces more powerful than him were the cause of both his rise and fall.

To see what I mean, you gotta cast your mind back to the summer of 2015.

Stephen Harper, racist dog-whistle enthusiast, oil industry advocate and enjoyer of the LEGO™ man standard haircut was the prime minister of Canada, and the tides had turned squarely against him. Journalists referred to him as “man in blue suit” to avoid using his name when he went to self-serving photo ops. He was being rightly criticized for “muzzling” scientists when the science disagreed with his policies. His government lacked transparency so severely that the Harper government was even found to be in contempt of Parliament for outright refusal to provide our representatives with information on what they were even doing. In the death-throes of his government, he proposed a transparently racist “barbaric cultural practices” snitch-line so that white people could tell on non-white people to the government and make their lives worse. There was a lot to hate, and he was flailing.

The “orange wave” of NDP support had just swept across Québec and large swathes of the country. The Syrian refugee crisis was in full swing, and, if you can believe it, Canadian political parties were competing against each other to promise that their policies would welcome more Syrian refugees than other parties’ policies. Thomas Mulcair was ascendant. At the beginning of August of 2015, smart money was on the next prime minister of Canada having a beard. (Mulcair had a big beard. We’re literally talking about facial hair. Canadian journalists got weird about that.)

The Liberal Party was pretty much dead and forgotten at this point in history, except that Trudeau II had taken the leadership, and he was a household name because his father was prime minister.

The bulk of Mulcair’s support was in Québec, and he was Harper’s main worry. So, Harper cried “niqab.” That is to say, he announced that his government would ban federal employees from wearing a niqab. Almost no-one in Canada wears them to begin with, and they are absolutely not a problem here, but politicians use the fear of niqabs for political gain. These policies have also, unfortunately, been somewhat popular in Québec and provincial politicians have used them with success to make life worse for certain groups and to give white people license to be racist.

In response to Harper’s announcement, Mulcair did the right thing: He said this was a racist ploy on the part of Harper and the Conservatives and came out against such a transparently xenophobic policy. As a result, Mulcair’s poll numbers dipped in Québec, at first ever so slightly.

You have to remember that at the time, for progressive voters in Canada, we were desperate for a change from a decade of Harper, and we were terrified that splitting the vote between the Liberal Party and the NDP would allow Harper to eke out another minority government. In fact, this exact fear was crystallized into the 2015 electoral reform campaign promise of the Liberal Party, which Trudeau II famously reneged on.

And so, when Mulcair’s poll numbers dipped in Québec after the racist Harper niqab policy proposal, it was like all the progressives in Canada looked at each other and in a panic we all said, “Okay so we’re voting for Trudeau then? It’s Trudeau? Okay here we go.” And then Trudeau II was elected to a majority government.

I can only speak for me and the people I know, but I’d bet that very few outside the Liberal Party faithful actually wanted to give Trudeau a majority government in the first place. A majority government in Canada is a very strong mandate, with few checks on their power. And we’d lived through the iron grip of Harper for long enough that, at least to the people I knew, a minority government with someone other than Harper as PM sounded pretty good. From the perspective of a progressive but non-party-affiliated voter in Canada, we weren’t picky about who it was that replaced Harper, as long as it was someone progressive.

And that’s why I hear people saying that everyone loved Trudeau so much in 2015 and wonder what election they were watching. We didn’t all fall in love with Trudeau. It felt more to me like we were just terrified of another Harper government and overshot the mark to give him a majority.

There was a brief moment of hope after Trudeau II took office, where he did a bunch of progressive-leaning things, like having equal numbers of women and men in his cabinet or legalizing recreational pot use. But even those things sort of lose their shine when you look too close. The SNC Lavelin scandal and the whole thing with Chrystia Freeland more recently show that Trudeau will burn up the careers of women around him if it comes to that. And the legalization of pot happened in such a haphazard way that it almost seemed like it was designed to maximize the number of people whose lives were ruined by it.

Trudeau was never well-loved by the left, because he kept appropriating the language and aesthetics of progressives, like anti-racism and environmentalism, but he lacked much meaningful action in that direction. Trudeau never did figure out how many times he did blackface, for example, and his environmental credentials are somewhat marred by the billions he spent on investing in oil pipelines.

As far as the right, they hated him just because he was a progressive voice who even paid lip-service to feminism, anti-racism, environmentalism etc. to begin with. This all got super-charged by the global political lurch to the right that happened during and after the Covid pandemic. Best I can tell what happened there is that there’s always been conspiracy theory types on the internet, and when public health measures started being put in place to address an actual real global problem, that poured gasoline on all those tiny, already-existing fascist-adjacent fires, and gave them the space to say, “Look, we told you. It’s happening, see? We were right all along!”

So that’s how you end up with the Freedom Convoy in Ottawa in 2022. It was a generic racist/xenophobic/anti-government/right-wing/proto-fascist inflammation of the already-existing problem of a certain kind of mostly white male entitlement that we never really fully addressed because it was in the fringe, and this gave it the space to go mainstream. (And I swear if I hear one more journalist tells me that they were gullible enough to buy the “vaccine mandates” pretext, I will scream. Trust me friend, this was not a gathering of public health policy enthusiasts.) The Freedom Convoy gained national and international attention, money and support from like-minded people, and Trudeau was antagonistic toward them.

This is the political moment where the ugly “Fuck Trudeau” flags came from, for example. And while I was never a loyal follower of Trudeau’s, I’m glad that those things have a much decreased political relevance, now that he’s resigned.

Enter Poilievre, the fast-talking Conservative politician, who will do anything it takes, cross any line, speak any falsehood that he has to, to grab the reins of power. (There’s still part of me that wonders if Poilievre was “Pierre Poutine” in 2011. He was a junior politician in Harper’s Conservative Party at the time, trying to make a name for himself. He’s exactly the kind of too-clever-by-half politician who would pull a stunt like that and use his own real name to take credit for it too. Anyway, I’m not making accusations, just asking questions.) Poilievre has harnessed the anti-Trudeau animus that was present and widespread for all the above reasons and turned it into a political campaign for office. (And wealthy right-wing Americans love it. I am kind of terrified of the idea that the Elon Musk’s support of him will translate into election interference that warps our democracy here.)

The infamous Liberal carbon tax that Poilievre constantly complains about, despite being mostly a pretty decent policy, was originally a Conservative idea. The left wanted stringent industry regulation on carbon emissions, so the counter-offer from the Conservative Party was a carbon tax. The Liberals compromised and implemented the Conservative carbon tax. Poilievre has complained about it ever since. Just shows you never to compromise with conservatives—they’ll just change the goalposts.

So these are the headwinds that Trudeau was pushing against recently when his party started losing by-elections, and calls started to come for him to resign.

The nail in the coffin was the Mark Carney incident. Basically, they wanted to bring in a new, big name to breathe life into a party that’s gotten stale after 10 years of power, and that was gonna be Mark Carney. Trudeau was gonna push Freeland out of her spot as Finance Minister and he was going to install Carney. Freeland has been Trudeau’s biggest supporter, and the one who stood by him through all his scandals. She’s the “adult in the room” in the government. And he decided to push her under the bus, so she quit on the day she announced her budget. Trudeau hadn’t got Carney to officially take the Finance Minister spot, and Carney didn’t want to look like the guy who used Freeland as a rung on his ladder to success, so he turned it down, leaving Trudeau looking bad, and having precious few allies and no credible path forward.

So, he asked the Governor General to prorogue parliament until the Liberals can choose a new leader, and he announced his resignation.

And there you have it, the rise and fall of Trudeau II as seen by a Canadian progressive voter who is not affiliated with any party, but who actually remembers the context for what happened over the last 10 years.

How to start writing an R package

This is not a complete exhaustive resource for writing R packages. Think of this more as a “quick start” guide.

I’m writing this using R version 4.4.1 through ESS in Emacs on Manjaro Linux, but most of this is pretty platform-agnostic, and could probably work anywhere else. I’m assuming you keep your git repos in a folder called Projects in your home folder (~/Projects).

I will use “>” to indicate a new line in the R console and “$” to indicate a new line in a command line terminal.

1. Install R packages

Install the following R packages:

  • devtools
  • usethis
  • available

2. Choose a name for your package

> library(available)

## You can use the suggest function to generate possible names
## or you can come up with one yourself
> available::suggest("A tool to categorize clinical trials by measures of feasibility from their clinicaltrials.gov record")

categorizeR

## Check that the name is available on Github and CRAN
> available::available("categorizeR", browse=FALSE)

Name valid: ✔
Available on CRAN: ✔ 
Available on Bioconductor: ✔
Available on GitHub:  ✔ 
Abbreviations: http://www.abbreviations.com/categorize
Wikipedia: https://en.wikipedia.org/wiki/categorize
Wiktionary: https://en.wiktionary.org/wiki/categorize
Sentiment:???
Abbreviations: http://www.abbreviations.com/categorizeR
Wikipedia: https://en.wikipedia.org/wiki/categorizeR
Wiktionary: https://en.wiktionary.org/wiki/categorizeR
Sentiment:???

## You don't have to go through this, but it's a decent way to
## choose a name

For the following, we’re gonna assume you went with testpack for the name of your project.

3. Make a git repo

Go to Github and click the “New” button.

Enter the name for your package that you chose above (for the following we will use testpack) and click “Create repository.”

Now copy the ssh link, it will look something like this: git@github.com:bgcarlisle/testpack.git

In a terminal window do the following:

$ cd ~/Projects

$ git clone git@github.com:bgcarlisle/testpack.git

Cloning into 'testpack'...
warning: You appear to have cloned an empty repository

$ cd testpack

$ git config --local user.email "my-email@users.noreply.github.com"

Now you have your repo (testpack) and it’s cloned to your local machine (it’s in ~/Projects/testpack) so you can work on it.

4. Fill the empty repo folder with the skeleton of an R package

Do the following in R:

## Go to the folder that is one level up from the git repo
## folder
> setwd("~/Projects")

## Load the devtools library
> library(devtools)

## Create the basic R package structure in the git repo
## folder
> create("testpack")

✔ Setting active project to "/home/researchfairy/Projects/testpack".
✔ Creating R/.
✔ Writing DESCRIPTION.
Package: testpack
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R (parsed):
    * First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID)
Description: What the package does (one paragraph).
License: `use_mit_license()`, `use_gpl3_license()` or friends to
    pick a license
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
✔ Writing NAMESPACE.
✔ Setting active project to "<no active project>".

> setwd("testpack/")

Now you have two files and a folder inside your R package folder: DESCRIPTION, NAMESPACE and R/.

Edit DESCRIPTION with metadata that matches your project (title, version, your author details, description).

To choose the GPL license: use_gpl3_license()

(You can choose another license, but you’ll have to look up the function call yourself.)

Commit changes to your git repo and push to the server, so you have a baseline to work from.

5. Add a function to your package

Create a file with a .R extension in the R/ folder that is inside your package root folder and give it a descriptive name. If I were adding a function called testmath() to my package, I’d call the file R/testmath.R.

Use the following format for your function:

#' @title Function title
#'
#' @description A full paragraph describing the function
#'
#' @param varname A description of the argument 'varname' that will be
#'     passed to the function
#'
#' @return A description of what the function will return on
#'     completion
#'
#' @export
#'
#' @importFrom magrittr %>%
#'
#' @examples
#'
#' testmath(4)

testmath <- function(varname) {
    return(0)
}

Most of this is self-explanatory, except for a couple notes:

The @export line tells R that the function in question should be made available to the end-user of the package. If you leave it out, the function will be usable by other functions in the package, but not by the end user.

The @importFrom magrittr %>% line tells R to import the pipe operator as implemented by the magrittr package, which I use pretty much everywhere. If you don’t use it, you can leave this out.

Now that we have another package as a dependency, I’ll tell you how to tell your package to include other packages:

> use_package("magrittr")
✔ Adding magrittr to Imports field in DESCRIPTION.
☐ Refer to functions with magrittr::fun()

This will add magrittr to the DESCRIPTION file using the correct syntax. As the message says, be sure to always use the double-colon notation to refer to functions implemented by other packages.

You’ll also want to write some unit tests for any functions you write. To start a unit test for the testmath() function that you just wrote up, type: use_test("testmath")

This will create a file, tests/testthat/test-testmath.R, along with all the structure that R needs to run all the tests you will do on your package at once.

Inside the unit test file, write something like this:

test_that(
    "testmath works", {
    expect_equal(
        testmath(10),
        0
    )
})

6. Add some data to your package

Imagine you have some data frame in R, call it testdata. You want anyone who loads your package to have access to these data. This is how I would do it:

I would make a file called data-raw/generate-testdata.R in the project folder that contains a well-commented R script I used to generate the testdata data frame.

Then at the of this file, I’d put:

## Write data set to a CSV in the inst/extdata/ folder
if (! file.exists("inst/")) {
    dir.create("inst/")
}
if (! file.exists("inst/extdata/")) {
    dir.create("inst/extdata/")
}
comparator %>%
    write_csv("inst/extdata/testdata.csv")

## Write data set to a .dba file in the data/ folder
usethis::use_data(testdata, overwrite = TRUE)

Now all the data is saved as both a .dba file and a .csv in the package.

To document your data, create R/testdata.R with the following inside it:

#' Test data
#' 
#' @format Info about the formatting
#'
#' @usage
#'
#' data(testdata)

"testdata"

Details on best practices for documenting data in R packages here: https://r-pkgs.org/data.html

7. Test and document your package

Generate a file for package documentation using the following command:

> devtools::use_package_doc()

Now edit the file that was just created at R/testpack-package.R:

#' @details This package provides 1 function for doing miscellaneous
#'     math stuff
#'
#' @details testmath() always returns the number 0 regardless of what
#'     you give it
#'
#' @references Carlisle, BG. The grey literature, 2024.
#' 
#' @keywords internal
"_PACKAGE"

## usethis namespace: start
## usethis namespace: end
NULL

To run all your package’s unit tests, use the following command:

> devtools::test()

ℹ Testing testpack
✔ | F W  S  OK | Context
✔ |          1 | testmath

══ Results ══
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ]

To document your package, use the following command:

> devtools::document()

You can also use devtools::check() to check that all the formatting has been done correctly.

If that runs properly, install your package with devtools::install() to test it locally. When you’ve made changes, you can remove the package with remove.packages("testpack") and reinstall. (Pro-tip, quit R after every time you remove the package, or it gets confused sometimes.)

Commit your changes in git and push to Github. The package can now be installed by anyone using devtools::install_github("bgcarlisle/testpack").

This is probably enough to get you started, next time I’ll try to cover rhub and submitting to CRAN.

How to set up mu4e to work with Protonmail Bridge

This works on my Manjaro setup as of 2024-03-07. I can’t guarantee it will work with anything else. I took part of these instructions from another blog post (thanks!), but they have been adapted specifically to work with Protonmail Bridge.

First install openssl and isync if not already installed

Then install mu from AUR: https://aur.archlinux.org/packages/mu

Open Protonmail Bridge and copy your password into the file ~/.emacs.d/.mbsyncpass; then encrypt it and delete the file with the unencrypted password:

$ cd ~/.emacs.d
$ gpg2 --output .mbsyncpass.gpg --symmetric .mbsyncpass
$ shred -u .mbsyncpass

Put the following into the file ~/.authinfo, replacing “you@proton.me” with your Protonmail username and “really#!!1good__pass0” with your Protonmail password from the Bridge app. Make sure that it matches the details for your SMTP credentials.

machine 127.0.0.1 login you@proton.me port 1025 password really#!!1good__pass0

Then encrypt that file and delete the unencrypted version:

$ cd ~
$ gpg2 --output ~/.authinfo.gpg --symmetric ~/.authinfo
$ shred -u .authinfo

Check whether cert.pem exists in the folder ~/.config/protonmail/bridge/ already. If not, export your certificate from the Bridge app, by going to Settings > Export TLS certificates, and save them in ~/.config/protonmail/bridge/; you may need to create this folder if it doesn’t exist.

Make a config file for isync, ~/.emacs.d/.mbsyncrc with the following contents. Replace “you@proton.me” with your Protonmail email address.

IMAPAccount protonmail
Host 127.0.0.1
User you@proton.me
PassCmd "gpg2 -q --for-your-eyes-only --no-tty -d ~/.emacs.d/.mbsyncpass.gpg"
Port 1143
SSLType STARTTLS
AuthMechs *
CertificateFile ~/.config/protonmail/bridge/cert.pem

IMAPStore protonmail-remote
Account protonmail

MaildirStore protonmail-local
Path ~/.protonmail/mbsyncmail/
Inbox ~/.protonmail/mbsyncmail/INBOX
SubFolders Verbatim

Channel protonmail
Far :protonmail-remote:
Near :protonmail-local:
Patterns *
Create Near
Sync All
Expunge None
SyncState *

Finally, configure Emacs to use mu4e. I put the following in my ~/.emacs file. Some of it is personal preferences (like the bookmarks) but some of it you’ll need in order to get the thing to work at all (like the “changefilenames when moving” part). Make sure to replace “you@proton.me” with your Protonmail address.

;; This loads mu4e
(add-to-list 'load-path "/usr/share/emacs/site-lisp/mu4e")
(require 'mu4e)

;; This tells mu4e what your email address is
(setq user-mail-address  "you@proton.me")

;; SMTP settings:
(setq send-mail-function 'smtpmail-send-it)    ; should not be modified
(setq smtpmail-smtp-server "127.0.0.1") ; host running SMTP server
(setq smtpmail-smtp-service 1025)               ; SMTP service port number
(setq smtpmail-stream-type 'starttls)          ; type of SMTP connections to use

;; Mail folders:
(setq mu4e-drafts-folder "/Drafts")
(setq mu4e-sent-folder   "/Sent")
(setq mu4e-trash-folder  "/Trash")

;; The command used to get your emails (adapt this line, see section 2.3):
(setq mu4e-get-mail-command "mbsync --config ~/.emacs.d/.mbsyncrc protonmail")
;; Further customization:
(setq mu4e-html2text-command "w3m -T text/html" ; how to handle html-formatted emails
      mu4e-update-interval 300                  ; seconds between each mail retrieval
      mu4e-headers-auto-update t                ; avoid to type `g' to update
      mu4e-view-show-images t                   ; show images in the view buffer
      mu4e-compose-signature-auto-include nil   ; I don't want a message signature
      mu4e-use-fancy-chars t)                   ; allow fancy icons for mail threads

;; Do not reply to yourself:
(setq mu4e-compose-reply-ignore-address '("no-?reply" "you@proton.me"))

;; maildirs
(setq mu4e-maildir-shortcuts
  '( (:maildir "/Inbox"     :key  ?i)
     (:maildir "/All mail"  :key  ?a)
     (:maildir "/Folders/Work"    :key  ?w)))

;; signature
(setq message-signature "bgc")

(setq mu4e-bookmarks
  '((:name  "Unread messages"
     :query "flag:unread and maildir:/Inbox"
     :key   ?u)
    (:name  "Today's messages"
     :query "date:today..now"
     :key ?t)
    (:name  "Last 7 days"
     :query "date:7d..now"
     :key ?7)
    (:name  "Messages with Word docs"
     :query "mime:application/msword OR mime:application/vnd.openxmlformats-officedocument.wordprocessingml.document"
     :key ?w)
    (:name  "Messages with PDF"
     :query "mime:application/pdf"
     :key ?p)
    (:name  "Messages with calendar event"
     :query "mime:text/calendar"
     :key ?e)
    ))

;; This fixes a frustrating bug, thanks @gnomon@mastodon.social
(setq mu4e-change-filenames-when-moving t)

Last thing to do is create the folders where mu will store your messages and then start it indexing!

$ cd ~
$ mkdir .protonmail
$ mkdir .protonmail/mbsyncmail
$ mu init --maildir=~/.protonmail/mbsyncmail/ --myaddress=you@proton.me
$ mbsync --config ~/.emacs.d/.mbsyncrc protonmail
$ mu index

This will fetch all your email and save it in that folder. It might take a while. When this all finishes, you can open up Emacs and M-x mu4e will open up mu4e for you!

Lessons that we refused to learn from Theranos: Neuralink’s unregistered and unpublishable research

On January 29, 2024, Elon Musk posted a claim on X.com​1 that in a clinical trial run by Neuralink, one of its devices was successfully implanted into a human participant.​2 Aside from a two-page study brochure published on the Neuralink Patient Registry website,​3 this is the only source of information that we have on the clinical trial, and the only indication that the study has started recruiting participants.

The trial has not been registered on ClinicalTrials.gov or any other clinical trial registry, the study protocol is not available, and there is no published statistical analysis plan. Registration of a clinical trial is a legal requirement under FDAAA​4, however phase 1 and device feasibility studies are exempt from this requirement, and presumably this study falls under this category.

While prospective registration may not be legally required, it is still an ethical requirement of the Declaration of Helsinki​5 that every clinical trial be registered prospectively. The rationale for this is to prevent certain kinds of scientific bias, such as the non-publication of non-positive results, as well as outright scientific fraud, such as changing a trial’s primary outcome after the results are known. The Declaration of Helsinki also requires that all clinical trial results be made publicly available, regardless of the outcome. Prospective registration is also a condition for publication according to the policy of the International Committee of Medical Journal Editors (ICMJE),​6 which makes the Neuralink trial unpublishable in any journal that holds to this standard. (Whether an ICMJE journal will apply this standard rigorously is another question.)

While Elon Musk may be content to conduct a programme of secret clinical research outside the scrutiny of peer review, we have already seen what happens when a charismatic leader with a cult following does so. Years before the downfall of the blood testing company Theranos, warnings were raised about the clandestine nature of their “stealth research” programme.​7 These warnings were largely unheeded and in the end, the blood testing methods they touted were exposed as a fake and its founder was convicted of fraud and sent to prison.​8 Theranos provided inaccurate results for an estimated one out of ten tests, placing at risk the proper care of thousands of patients.​9

The ethical standards of prospective registration and publication of results that are enshrined in the Declaration of Helsinki are not meaningless red tape intended to slow down the march of progress. They are meant to reduce biases, prevent fraud and help ensure that the risks and burdens that patient participants take on are redeemed by as much socially valuable knowledge as possible. Despite the “Silicon Valley” thinking that difficult and long-standing problems in biomedicine can be solved by the sheer cleverness and work ethic of those who have success writing an app or shipping a piece of computer hardware,​10 the biology of human disease is fundamentally different, more difficult to understand, and requires risk on the part of human subjects to progress, which comes with certain moral obligations. While it is not literally illegal for Elon Musk’s Neuralink to conduct an unpublishable device feasibility trial without prospective registration, this is a poor justification for doing so.

References

1. Musk E: X.com. 2024. Available from: https://twitter.com/elonmusk/status/1752098683024220632

2. Drew L: Elon Musk’s Neuralink brain chip: what scientists think of first human trial. Nature. 2024. DOI: 10.1038/d41586-024-00304-4

3. Neuralink Corp.: Neuralink PRIME Study Brochure. 2023. Available from: https://neuralink.com/pdfs/PRIME-Study-Brochure.pdf

4. United States Congress: Food and Drug Administration Amendments Act of 2007. Public Law. 2007;110-85:121. Available from: https://www.congress.gov/110/plaws/publ85/PLAW-110publ85.pdf

5. World Medical Association: Declaration of Helsinki – Ethical Principles for Medical Research Involving Human Subjects. 2013. Available from: https://www.wma.net/policies-post/wma-declaration-of-helsinki-ethical-principles-for-medical-research-involving-human-subjects/

6. International Committee of Medical Journal Editors: ICMJE Clinical Trial Registration Statement. 2019. Available from: http://www.icmje.org/recommendations/browse/publishing-and-editorial-issues/clinical-trial-registration.html

7. Ioannidis JPA: Stealth Research: Is Biomedical Innovation Happening Outside the Peer-Reviewed Literature?. JAMA. 2015;313:663. DOI: 10.1001/jama.2014.17662

8. Lowe D: Thoughts on the Elizabeth Holmes Verdict. 2022. Available from: https://www.science.org/content/blog-post/thoughts-elizabeth-holmes-verdict

9. Das RK and Drolet BC: Lessons from Theranos – Restructuring Biomedical Innovation. Journal of Medical Systems. 2022;46. DOI: 10.1007/s10916-022-01813-3

10. Lowe D: Silicon Valley Sunglasses. 2022. Available from: https://www.science.org/content/blog-post/silicon-valley-sunglasses

So you’re going to watch Who Framed Roger Rabbit (1988), but you’re young (no spoilers)

The following is a little bit of context for young people that will take a few sections of this (deservedly) well-beloved film from “kinda weird” to “very funny, actually.”

I won’t explain why you need to know these things. Just trust me that having this context will improve your enjoyment of this film.

Supporting features

Double-features at movie theatres used to be fairly commonplace. You’d go to the cinema and there’d be two films. Usually the big full-length blockbuster feature film would be the second one.

This tradition continued a bit after the advent of VHS, in which it was fairly common for the main feature on a video cassette to be prefaced by a “supporting feature.” These would often be short films, sometimes animated or light, either contrasting or complimenting the main feature.

Supporting features are parodied in Monty Python’s The Meaning of Life (1983), in which the supporting feature famously attacks the main feature. The only remaining modern vestige of this that I can think of are Pixar shorts that often come along with their feature films.

The “shave and a haircut” jingle

Probably the most famous jingle of all time is the “shave and a haircut” jingle. You’ve heard it before even if you think you haven’t. Go to the Wikipedia link above and remind yourself what it is. It starts with the sing-song “shave and a haircut” with the response “two bits.”

It’s an old barbershop jingle from the time when “two bits” still meant “twenty-five cents” or at least “very cheap.” Note that it’s difficult to stop yourself from doing the “two bits” reply when prompted with “shave and a haircut.”

The Merry-Go-Round Broke Down song

From 1930 to 1969, Looney Tunes, a very famous cartoon television series was produced by Warner Brothers. The theme song for this television series was called The Merry-Go-Round Broke Down.

Harvey (1950)

Jimmy Stewart starred in a very famous and generally wellliked film about a six-foot-tall invisible rabbit named Harvey.

Public transit used to be real

Between 1938 and 1950, General Motors, through the use of several subsidiary companies, bought public transit systems in about 25 cities in the United States in order to dismantle them to eliminate competition for automobiles. This is known as the General Motors streetcar conspiracy.

And according to the Wikipedia article:

Most of the companies involved were convicted in 1949 of conspiracy to monopolize interstate commerce in the sale of buses, fuel, and supplies to NCL subsidiaries, but were acquitted of conspiring to monopolize the transit industry.

Wikipedia, General Motors Streetcar conspiracy

Now you, a young, have the context to understand some jokes that were “very funny, actually” in 1988.

The no-true-cishet fallacy: can we all stop saying “I bet the attacker was gay”?

It has been a long time since the last high-profile case of violence against queer people, and while I hope that there aren’t any more attacks against queer people coming again ever, realistically, it’s only a matter of time before it happens again. So before it happens and in hopes that nobody feels targeted directly by this post, I would like to suggest a change in the way that many people typically respond to high-profile cases of violence committed against queer people:

Stop saying “I bet the attacker was gay.”

Please, can we all just—don’t. If you have no reason to think that the attacker is gay other than the fact that it’s a case of hate-motivated violence committed against a queer person, maybe we can all agree not to make this particular assumption.

You hear this all the time from straight people after anti-queer violence, and I understand where it’s coming from. You want to distance yourself from the attacker, communicate that you consider anti-queer violence to be unthinkable, even confusing to the point of not even being able to understand why any straight person would ever want to do this.

And while I understand that impulse, if the knee-jerk response to all anti-queer violence is to assume that the only possible motivation for it could be internalized homophobia, it implicitly sends a couple messages that aren’t great and that maybe we could be a little more careful about.

The first reason I’m asking you to stop saying “I bet he’s gay” when there’s violence against queer people is that it blames queer people for violence committed against us.

There’s a very long history of straights blaming queer people for violence they commit against us. The “gay panic” legal defence, for example, is not that far in our collective rear-view window, so to speak. (If you don’t know about it, look it up. It’s horrifying.) People still do the whole “what was he wearing/doing to provoke it?” thing when there’s violence against queer people, as if that was relevant in any way. Suffice it to say, we haven’t “made it” yet.

And when your first reaction to every gay person being hurt is to say “the attacker is probably a closet case,” you’re suggesting that violence against queer people is all a matter of queer in-fighting. “It’s just the gays doing that to each other again, not our problem.”

And yes, internalized homophobia is real, but it’s not like we have already ascended to some Star Trek future beyond the point where straights commit violence against queer people. We live at a time where the most powerful country in the world is exactly one election cycle away from complete surrender to actual fascism, and a right-wing reactionary trucker convoy occupied Ottawa for weeks. Some irresponsible straight people have been stoking that particular Nazi-adjacent fire for a good long time and when that happens, queer people get burned.

The second reason I’m asking you to stop saying “I bet he’s gay” when there’s violence against queer people is that it absolves straight people of violence that they commit against queer people.

Yes, some straight people hate gay people. I can already hear objectors asking, “but why would a straight person be that hateful if he isn’t gay himself?” I’ll give you a few possible reasons just off the top of my head: 1. Politically motivated fascist hatemongering, using queer people as an “other” to dehumanize. 2. Centuries of discrimination that has been in some cases institutionalized. 3. The insecurity and violence with which men in the West are socialized to punish any deviation from traditional masculinity. 4. Spillover from misogyny from straight dudes who hate women so much that they are also willing to hurt queer people. 5. Resentment from straight dudes who scream as if mortally wounded at the thought of any progress at all in the advance of the rights of queer people and take it as an attack against their own privileges and feel entitled to violent retaliation.

Take your pick. It’s not a big mystery and feigning ignorance of all these dynamics does not make you A Good Straight Ally. It just makes you frustrating to talk to.

Hate and violence against queer people is mostly a straight people mess, and pretending it’s not doesn’t help to clean it up. I really shouldn’t have to explain this to you, but yes, straight people can be anti-queer and violent too, believe it or not! Nobody needs uninformed speculation about the attacker’s sexuality, and shifting the blame to queer people for violence committed against us doesn’t help.

Stop saying “I bet the attacker was gay.”

(2024-04-01 EDIT: I wrote this like 2 years ago, but due to some (gestures wildly) stuff that just happened it came to mind and I edited the title because I came up with the “no true cishet” thing just now)

How to make R recognize command line software installed on Mac OS with Homebrew

Imagine you installed an R package that depends on some command line software, for example pdftotext. You’ve successfully installed it using Homebrew, but when you run the R package, you get an error that looks like this:

sh: pdftotext: command not found

So you double-check that pdftotext is installed on your system in the Terminal.

$ which pdftotext
/opt/homebrew/bin

So far so good. Then you double-check that pdftotext is available to R.

> Sys.which("pdftotext")
"pdftotext"
""

Uh oh. The path to pdftotext should be inside the second, empty set of quotes there.

What this means is that your shell’s PATH differs from R‘s. So the place that your Terminal looks up what programs are available to it is different from the place that R looks up what programs are available to it.

You can tell what paths are available to the Terminal and which ones are available to R by typing the following in the Terminal:

$ printenv PATH

And the following in your R console:

> Sys.getenv("PATH")

At this point you should see what the differences are, and which ones are missing. Probably what’s missing is the Homebrew directory, /opt/homebrew/bin.

So how do you fix this? We need to tell R on startup to look for programs installed by Homebrew.

If it doesn’t already exist, make an empty text file in your home directory called .Rprofile. Edit this file using your text editor of choice (E.g. R Studio) so that it includes the following:

old_path <- Sys.getenv("PATH")
Sys.setenv(PATH = paste(old_path, "/opt/homebrew/bin", sep = ":"))

When you restart R, your Homebrew-installed R package should now function!