A Short Program in J

Inspired by James Hague and waiting for a new release of Cuis I thought I might sit down and noodle with J. See if I get any further than I did last time.

So, I recently wrote a trivial Markov chain algorithm for Smalltalk. The code is pretty short, so I’ve included it at the bottom of the post. Rub your eyes on it after we talk about the J code.

I want to implement this algorithm in J, but before we can do that let’s learn how to do something a bit simpler. Let’s just count up letter occurrences. I’m inclined to see that as being related, but let’s face it, even that intuition is probably wrong with J.

First, let me show you my solution.

letter_frequency =: =((~. @: ;/) ,. (;/ @ (+/ @ |: @ =))) &: sort

Yeah… alright…

NB.
   letter_frequency 'Mississippi'
┌─┬─┐
│M│1│
├─┼─┤
│i│4│
├─┼─┤
│p│2│
├─┼─┤
│s│4│
└─┴─┘

So it seems Perl has a competitor!

Verbs

I’ll try to spare you the J tutorial because there’s a few really truly weird things about this piece of code, but let me give you this brief interactive tutorial to show you how I built this up.

NB.
   w =: 'Mississippi'
   sort w
Miiiippssss
   ;/ sort w
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│M│i│i│i│i│p│p│s│s│s│s│
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
   ~. ;/ sort w
┌─┬─┬─┬─┐
│M│i│p│s│
└─┴─┴─┴─┘

So, sort does what you’d expect. The syntax here is reminiscent of Polish notation: we’re adding additional stuff to do to the front of the list and they’re taking effect last. So on line one I sort the letters, on line two I box the letters, and on line three I am ‘nub’ing (removing duplicates) from the boxed letters.

The functions are called verbs here, and they’re all fundamental operators or functions except for /, which is called an “adverb” and takes a second function as an argument and performs what a Haskell user would call foldr1. So the ;/ is saying, apply ; between each element, and ; boxes its arguments.

NB.
   sort w
Miiiippssss
   = sort w
1 0 0 0 0 0 0 0 0 0 0
0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1
   |: = sort w
1 0 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 0 1 0
0 0 1 0
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
   +/ |: = sort w
1 4 2 4

It would be much easier to appeal to what is visually going on here than to explain what’s really happening, so I think I’ll rest my case on that. The part here I had the hardest time figuring out was the = verb. It is defined as “self-classify” and a bunch of examples in the documentation were enough to convince me it was close to what I need. The |: verb pivots the table, and we again use the / adverb to produce a fold, in this case summation. It is visually apparent that what’s happening there is we are summing down the columns to count occurrences. I am quite sure there is a less verbose way to do this without the pivot, but I don’t know what it is yet.

In any case, it should now be clear that we have both of the constituents one would need to show letter frequencies: an array of letters and another array of their occurrences:

NB. 
┌─┬─┬─┬─┐
│M│i│p│s│
└─┴─┴─┴─┘
 1 4 2 4

One can combine these a variety of ways, but the colorfully-named “stitch” is the easiest I’ve found so far.

NB.
   (;/ 2 3 4) ,. (;/ 8 9 0)
┌─┬─┐
│2│8│
├─┼─┤
│3│9│
├─┼─┤
│4│0│
└─┴─┘

At this point, things get a bit weird. Defining functions in J is done either implicitly or explicitly. Implicitly is basically the same as point-free form in Haskell, by building up and composing functions. Explicit function definition relies on a piece of rather grody syntax, and didn’t work for me for one reason or another:

NB.
   letter_frequencies =: 3 : 0
letters =:  ~. ;/ sort y
occurs =: ;/ +/ |: = sort y
letters ,. occurs
)

I’m sure a better J user than me (say, someone who has used it for more than 24 hours) would be able to point out what I did wrong, but I must admit a dislike of the “magic numbers” that seem to appear in J. File input/output is similar; for example:

s =. 1!:1 <'system\packages\misc\jforc.ijs'

Yes, 1!:1 is a single “verb” which means, give me the content of the file named on the right. Not quite as “readable” as +/, eh? Anyway, we can instead define this function implicitly, as I did at the beginning of this article. There’s just one question: what’s this @: and why is it in all of my functions? For example, the following is fairly confusing:

NB.
  |: = sort w
1 0 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 0 1 0
0 0 1 0
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
   (|: = sort) w
1 1 0 0 1 0 0 0 0 0 0
   |: (= sort) w
1 1 0 0 1 0 0 0 0 0 0
   (|: =) sort w
|domain error
|       (|:=)sort w

Odd. What about @?

NB. Correct way
   (|: @ = @ sort) w
1 0 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 0 1 0
0 0 1 0
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1

@ is basically unary function composition. So J isn’t like Forth, no problem with that. The really surprising thing isn’t that omitting the @ didn’t accomplish something, it’s that it did accomplish something: something weird and unexpected!

Trains, Forks and Hooks

The surprise is that “juxtaposition” of verbs is much more complex in J than in other languages. It leads to something called a verb train, but the Cliff’s Notes are that:

(f g) y
means y f (g y)
(f g h) y
means (f y) g (h y)

So above, when I wrote (|: = sort) w, J interpreted this as: (|: w) = (sort w), and when I had |: (= sort) w, J interpreted it as |: (w = (sort w)), neither of which is close to being what we want! In fact, what is being calculated is an item-by-item equality comparison between the string w and the sorted version of w, so you can see why we get the result we get:

NB. Ignore the code
   3 11 $ (;/ w) , (;/ sort w) , (;/ (= sort) w)
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│M│i│s│s│i│s│s│i│p│p│i│
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│M│i│i│i│i│p│p│s│s│s│s│
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│1│1│0│0│1│0│0│0│0│0│0│
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘

One of the tutorials has a good example of a place where this kind of thing is actually handy that I like:

NB. Sales tax in NM is 6.875%.
   sales_tax =: *&0.06875
   sales_tax 100
6.875
   (+ sales_tax) 100
106.875

The & adverb binds a parameter to a verb, making a binary operator (or a dyad, in J jargon) into a unary operator (or a monad in J jargon). So now it should be clear why the solution has all those @’s in it.

Ultimately, this took me about an hour to write, with heavy documentation consultation. For comparison, the following Smalltalk version took me about two minutes:

letterOccurrences: aString
  | dict occurrences |
  dict ← Dictionary new.
  aString do: [ :char |
    occurrences ← dict at: char ifAbsentPut: [ 0 ].
    dict at: char put: occurrences + 1 ].
  ↑ dict

Unsurprising and unsexy, but workable.

Epilogue

In reading APL - A Glimpse of Heaven it’s not hard to see the affection the author is showing for APL, an ancestor of J. In fact I was able to translate most of his examples to J while reading it.

It would be hard to find two languages that are more different than APL/J and Smalltalk. Surprisingly, similarities are not hard to find. Both languages are highly interactive and a high premium is put on playing with the language on your own with private experiments. On the other hand, there seems to be a lot more in-depth documentation on the APL/J side, in part (I’m guessing) because it is so much weirder and in part because once one reaches a certain level of proficiency and comfort with Smalltalk, one learns from the image rather than a book.

I do have to wonder though. All of the APL and J fanatics talk about the language as a “notation of thought.” It strikes me as a terrible idea to write software in a language so intrinsically unreadable, but maybe it’s true that after time it becomes quite clear. It looks like it has that odd benefit that SQL has, in that code once written probably lives a long life; I didn’t touch on this much, but all of the built-in verbs in J go to great lengths to be meaningful in both monadic and dyadic contexts and operating on different kinds of data, perhaps it works in practice like SQL in that, once the statement is written it rarely needs to be revisited. It certainly could have the benefit of inexpensive functional purity.

My friend Allan Poindexter has tried to explain to me on several occasions how Common Lisp through its macros is more powerful than Smalltalk. There’s a smidgeon of truth to that, though I believe modern Smalltalks have sufficiently expressive metaclass systems that you could achieve anything except raw syntactic extension. But at the same time, my solution to the above problem does not require three pages of English prose to explain it, and it seems obvious that in J this drawback would only get worse.

So it flies in the face one one’s expectations to hear allegations like this one from the Heaven article:

If APL were a specialist, complex language, it would only attract the “Boy Wonders” of IT, those with A-Grades in everything, whose horizons are limited by bits and bytes.

So it is paradoxical that the great majority of IT people have never really understood APL. Those who have used it successfully have very often not been computer-literate, or have only a slight knowledge … and they have frequently learned APL in isolation. That is to say, the language serves anyone prepared to explore it without prejudice.

It’s hard to evaluate this claim on the surface, but it is worth noting that industries in which APL seems to have penetrated are highly mathematical (actuarial work, finance) but not known for having strong, proud traditions of software engineering.

I said fairly recently that when I program Lisp I feel like I’m tricking the machine into doing what I want. J amplifies this feeling. Smalltalk does not. In fact, when I use Smalltalk I frequently feel like I’m stating the obvious. What I don’t like about Smalltalk, really, is the lack of books. I feel frustrated at the size of the library, and unsurprised that I spend more time reading the library than writing code. What’s funny is, I used to make the same complaint about Haskell. J, as terse, ugly, and frustrating as it is, it is also very fun, and seeing a two page reference to the 70ish fundamental verbs of the language definitely gives one the feeling you are dealing with some kind of alchemy of programming rather than a standard library.

So I will echo James Hague’s endorsement. Try it. You may find you like it quite a bit.

Postscript: Smalltalk Markov Chain Code

Note: I suck at Smalltalk. If you have suggestions, please email me.

Object subclass: #MarkovData
  instanceVariableNames: 'chain'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Markov'.

MarkovData>>addFrom: aStream
  "Add all the words in some data source to this Markov chain"
  | current previous |
  previous := #start.
  [ aStream atEnd ]
    whileFalse: [ 
      current := aStream next.
      self addWord: previous precedes: current.
      previous := current ].
  self addWord: previous precedes: #end

MarkovData>>addWord: previousWord precedes: anotherWord
  "Add all the words in this stream to this Markov chain."
  | words |
  words := chain at: previousWord ifAbsentPut: [ Bag new ].
  words add: anotherWord

MarkovData>>addWordsIn: aString
  "Add all the words in the passed string."
  self addFrom: (WordStream on: aString)!

MarkovData>>generateSentence
  "Generates a String sentence from the data we’ve collected so far."
  | wordStream |
  wordStream := String new writeStream.
  self generateSentenceOn: wordStream.
  ^ wordStream contents

MarkovData>>generateSentenceOn: aStream
  "Generates a random utterance on aStream."
  | word |
  word := self generateNextWordFor: #start.
  [ word = #end ]
    whileFalse: [ 
      aStream nextPutAll: word; space.
      word := self generateNextWordFor: word ].
  aStream skip: -1.  "back up a character, to not end on a space"

MarkovData>>generateNextWordFor: aWord
  "Generate the next word based on the given word."
  ^ (chain at: aWord) atRandom.

MarkovData>>at: aWord
  "Return the set of words that follow from this word."
  ^ chain at: aWord ifAbsent: [ Bag new ]

MarkovData>>initialize
  super initialize.
  chain := Dictionary new! !

MarkovData class>>buildFrom: aStringOrStream
  ^ self new addFrom: (WordStream on: aStringOrStream)