Normalising & Comparing Diacritical Characters

I'm currently working on a project which requires string comparison, nothing special so far. String comparison is a fairly common thing to do. In my project, I want to compare strings but ignore diacritical characters which are often found in non-english languages. For example, French.

It is important to note the importance of these characters, and that they should not just be discarded because "they are the same" - they are different and serve a very important purpose in each language.

However in my use case, I do want to disregard their differences and treat them the same as English characters

An example of a comparison I want to do is être and etre - note the circumflex on the ê in the first word. The correct spelling of this word is être, however I want to accept etre.

Bring forth Javascripts string.normalize. This normalises unicode characters and makes for comparison much simpler.

With the below example, we use the normalize method to address the unicode characters, then use the replace method to remove non word characters, and finally toLocaleLowerCase.

const normalised1 = input1
    .trim()
    .normalize("NFKD")
    .replace(/[^\w]/g, "")
    .toLocaleLowerCase();
    
  const normalised2 = input2
    .trim()
    .normalize("NFKD")
    .replace(/[^\w]/g, "")
    .toLocaleLowerCase();

  console.log(normalised1 === normalised2);

Menu