Skip to content

24-7 Today

Menu
  • Home
  • Ads Guide
  • Blogging
  • Sec Tips
  • SEO Strategies
Menu

Little useless-useful R functions – Markov babbler

Posted on September 10, 2025 by 24-7

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

This one is named, yes, you guessed it, after Markov chains. 🙂

The babbler is there to connotate the simplicity of useless R function.

It’s simple calculation of probability of words chaining and drawing the multiple times appeared chained words reminds of markov chain (although this is not it!).

The gist is is tokenization of words, counting the appearances and calculating the probabilities.

markov_babbler <- function(text, order = 2, n = 50, by_word = TRUE) {
  tokens <- if (by_word) str_split(text, "\\s+")[[1]] else unlist(str_split(text, ""))
  tokens <- tokens[tokens != ""]
  
  #add the removal of full stops,....
  token <- c('I', 'I am', 'to', 'all', 'Oh')
  
  df <- data.frame(
    from = sapply(seq_len(length(tokens) - order), function(i) paste(tokens[i:(i + order - 1)], collapse = " ")),
    to = tokens[(order + 1):length(tokens)],
    stringsAsFactors = FALSE
  )
  
  probs <- df %>%
    group_by(from, to) %>%
    summarise(freq = n(), .groups = "drop") %>%
    group_by(from) %>%
    mutate(prob = freq / sum(freq))
  
  current <- sample(unique(probs$from), 1)
  output <- unlist(str_split(current, " "))
  
  for (i in seq_len(n)) {
    next_word <- probs %>% filter(from == current)
    if (nrow(next_word) == 0) break
    next_token <- sample(next_word$to, 1, prob = next_word$prob)
    output <- c(output, next_token)
    current <- paste(tail(output, order), collapse = " ")
  }

Having this in mind, I have took Red Ridding hood (Brother Grimm) and plugged the story into the function. In both English and Slovenian languages.

…

Playing around with useless statistics is fun. Useless fun 🙂

And no function is complete with little ggplot for drawing the network of words.

  g <- graph_from_data_frame(probs %>% filter(freq > 1), directed = TRUE)
  plot <- ggraph(g, layout = "fr") +
    geom_edge_link(aes(edge_alpha = prob, edge_width = prob), color = "firebrick") +
    geom_node_label(aes(label = name), size = 4, repel = TRUE) +
    theme_void() +
    labs(title = "Markov Chain: Token Transitions")

As always, the complete code is available on GitHub in  Useless_R_function repository. The sample file in this repository is here (filename: Markov_babbler.R). Check the repository for future updates.

Happy R-coding and stay healthy!

Related

Related

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

©2025 24-7 Today | Design: WordPress | Design: Facts