Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Introduction
Naming a baby can be a difficult task. It can take – soon to be – parents quite some time to align
on a name they both like. Afterwards it can also be challenging to keep the name a secret until
birth.
If you are lucky, there might be some close friends or colleagues who are simultaneously expecting a
child. It can be an unique occasion where you can share experiences. However, wouldn’t it be a shame
if they have a similar name in mind as you? You would likely want your children to have a more or
less unique name, no? In data science we could use a Universally Unique Identifier (UUID) to solve
this problem. Or one might choose very exotic names, like Æ A-Xii. Even though basically
guaranteeing uniqueness, these might not be appropriate solutions for naming a child…
So, given that you have chosen a more ‘conventional’ name, there is a risk it is similar to the one
your friends chose. Before you start to design your birth cards, you might want to align with
eachother to avoid similarities. You could check that your names don’t start with the same letter.
You could also compare the total amount of letters. But unfortunately these kind of hints slightly
give away what your name could be. Moreover, they might overlook slightly similar names and
different ways of writing the same name. So how can you properly compare baby names without
revealing anything?
This seems like an impossible task. Which is the type of task we like to solve.
Let us use hashes to check for clashes!
What is hashing?
Powerful algorithms like md5 and
sha256 allow you to convert your baby name into some unique
text that seems like complete nonsense. As an example, we can do the following in R:
digest::digest('Open Analytics', algo = "md5") -> 620b15afa8838824d3f396b1cff4a68c
This conversion is called hashing. Think of it as taking a persons fingerprint. You can easily
take a fingerprint from a person, uniquely identifying that person. However, unless you have access
to a database with known criminals, a fingerprint alone is completely meaningless. Based on the
fingerprint alone, there is no way to reconstruct of which person it came.
So, if we only share the ‘fingerprints’ of our name(s) with our friends, we can easily compare if
they clash. Without spoiling any other information about the names.
The one loophole is of course that your friends can start trying out every possible name until they
find a match. They would waste a whole lot of time and karma points doing this though, especially if
they have no prior clue as to what your name might be. This is like brute-force hacking an account,
trying each and every password until you get access.
Because hashing is so easy and secure, it is no wonder it is used everywhere in IT. The most common
example is password storage. Instead of storing your actual password, good providers will only keep
an encrypted version. This is sufficient to check if you provided the correct password, without
actually ever knowing it themselves! Meaning that in case of data leaks, usually only the encrypted
version gets leaked. Which means data leaks are not always directly harmful.
Does this mean you are always safe? No! Hackers have big ‘fingerprint databases’ matching passwords
with their commonly found hashed forms. If you use an easy to guess password, chances are they’ll
find a match in their database. So make sure to use non-predictable passwords! But let’s stop the
technical rant here, this was just a small post about babies, no?
Try it out!
You can start comparing baby names here.
The app is written in R Shiny, an easy framework to create data science
apps. We afterwards build it using shinylive, which
allows it to run entirely within a browser. By the use of github pages
it can also be hosted for free. Excellent guidelines on how to do this can be found
here.
This approach might be one of the easiest free ways to deploy an application. However, keep in mind
that your app will be open to the entire web and might not be as performant as it could be. If you
need a more advanced open source solution for deploying apps, we recommend taking a look at
ShinyProxy.
Enjoy ecrypting and comparing your baby names!
Related