A Brief Foray into Random Name Generation

I got a bee in my bonnet about name generation this morning, so here’s a simple Javascript module for randomly generating names:

/*
# NameGenerator

Usage:

  const starNameGenerator = new NameGenerator(['Ceti Alpha', 'Scorpio', 'Draconis'...]);
  starNameGenerator.generate(); // -> random name

Works better with a decent sized (hundreds) of thematically similar examples to work from.
*/
/* global module */

function pick(array) {
  return array[Math.floor(Math.random() * array.length)];
}

class NameGenerator {
// data is a map from character-pairs to observed successors,
// consider the examples "how", "now", "brown", "cow"
// the pair "ow" would have the following successors
// [undefined, undefined, "n", undefined] (undefined -> end of word)

  constructor(examples) {
    const data = {'': []};
    examples.
    map(s => s.toLowerCase()).
    forEach(example => {
      let pair = '';
      data[pair].push(example[0]);

      for(let i = 0; i < example.length; i++) {
        pair = pair.substr(-1) + example[i];
        if (! data[pair]) data[pair] = [];
        data[pair].push(example[i + 1]);
      }
    });

    console.log(data);
    this.data = data;
  }

  generate() {
    let s = pick(this.data['']);
    let next = pick(this.data[s]);
    while(next){
      s += next;
      next = pick(this.data[s.substr(-2)]);
    }
    return s;
  }
}

module.exports = NameGenerator;

I wrote a much more convoluted (but simple-minded) star name generator for my galaxy generator several years ago.  The approach I took was to take a collection of star names and break them up into syllables (starting, middle, and ending) and then given a range of syllable lengths, assemble a name out of random pieces.

Today it occurred to me that I’ve never explicitly implemented anything using Markov chains before, so how about I build something that way and see how it compares? If you follow the link, you’ll see examples of star names generated randomly the old way (names with “bad words” in them are rejected).

I took a list of proper star names from Wikipedia and cleaned it up (e.g. the Greek letter prefix of a star name simply indicates its brightness relative to other stars in the same constellation, while a Roman Numeral is simply an indicator of a star being part of a binary or trinary system). This gave me a bit over 600 star names with which to seed the generator, and the results are pretty nice. (Again, no need for bad world filtering.)

The major disadvantage of the new generator is that it can generate really long names pretty easily because of the way it terminates. If I implemented a more sophisticated generator that looked at things like overall length and length of current word in weighing the probabilities it would probably help here, but that might be overthinking it (it’s pretty easy just to reject overly long names).

In general, I think the new generator produces more pronounceable names than my earlier attempt, and some of the new names it generates seem like they should be real names, which is exactly what I’m hoping for.

Algorithm

The algorithm is very simple. The constructor looks at which characters follow a given pair of characters in the input data, so if you start with “how”, “now”, “brown”, “cow” you get the following data for the pair “ow”: [undefined, undefined, ‘n’, undefined]. So, using this data to generate names, 75% of names will end immediately after an ‘ow’ and 25% will have an ‘n’.

In this system, the first character of a name is following the empty string, while the second character of a name is following the first letter. It follows that using [“how”, “now”, “brown”, “cow”] all names will begin with ‘h’, ‘n’, ‘b’, or ‘c’. and most will end in ‘w’ and the rest will end in ‘n’. Not super interesting.

Let’s try slightly more interesting seed data: [‘Frodo’, ‘Peregrin’, ‘Meriadoc’, ‘Bilbo’, ‘Adalgrim’, ‘Bandobras’, ‘Celandine’, ‘Doderic’, ‘Erin’, ‘Griffo’]. This doesn’t seem like it will be enormously fruitful at first glance, but it immediately generates lots of new names that, to me, sound authentic: Adalgrin, Adobris, Grine, Bandine, Froderim.

And here’s a link to a jsfiddle to see it in action (with a bigger set of names from Middle Earth as the seed). One of the really nice things about it is that you don’t need to filter out bad words, because they pretty much don’t get created if they’re not in the source data.

It occurs to me that a lot of the random content generation stuff I’ve done in the past was, in effect, recreating Markov chains naively, and understanding what I’ve done in those terms is powerful and clarifying.

And with that, I leave you with a random Jabberwockish word generator. Don’t bewortle! And have a borpallith day.