« Back to blog

Name and Address

As a south indian, with my surname coming before my own name, unconventionally, i have always had all the issues in the world trying to fill forms right from my child hood days. My name would figure in ten different manners at ten different places. And quite frankly it is a matter of great importance to any sapient being, his identity i mean. Another such issue is that of ones address. Now, even here i have had a lot of explaining to do every time i had to change SIM cards for Mobile phones, or while getting official documents processed. So, i thought with the on set of the new age information technology, and everything having an eVersion, forms (especially) official ones will not be far behind and they might just resolve some of these issues. But, as luck (or poor efforts) would have it, it has not. Every now and again i still hear people complaining about the issue of mismatched or wrong addresses especially because there is a great diversity in the manner how a Human Being identifies himself. Now, let me clarify, while the instruments of identity are virtually the same across the globe, i.e his Name and the Place where he lives. The issue comes in the manner in which these two entities are conveyed. The diversity comes here. And this is where i feel that as responsible software developers we must ensure that the customer must be given the freedom and flexibility to express and represent his name and address the way he deems fit, not us. And do not tell me that it is not possible. The idea is to identify the atomic elements in both these things. To find the basic building blocks of a Name and an Address. Here are some that I have identified: Name: Initials, First Name, Middle Name, Last/Surname, Maiden Name, Family Name, Village Name. Address: C/o, Line1, Line2, Street, Village, Town, District, City, Province, State, Pin/Postal/Zip Code, Country, Geo Codes. Hopefully these elements cover most of the name and places in the world. Now, it might appear to you that i have stretched the limit beyond a point with the Village property in the two classes. But keep in mind that these are for the whole set of names and addresses one can think of in any context. Point is that while designing a system it is a good practice to keep such generic templates and classes of information and use the properties selectively. Like there are few properties such as First Name, Surname and Line1 or Postal Code (in Address), that you just need to have in any situation. Baring that, most other fields can be used selectively as the situation demands. For instance, if you know that your clients are in Urban areas then you can drop the Town and Village fields from your address. Similarly, if you know that your client is in say, in Italy, then you can drop the State field. So that is the point, keep you design in a manner that can fit into any situation and place. Parsers: This something that people are trying to break their heads with. Most people are trying to parse Names and Addresses into the above mentioned fields, which mind you is a very difficult task. Now I would not try and talk about the address parser, it happens to be my firm's fundamental hiring question. But then i am willing to talk about the semantics of the Name Parser, another similar problem, but more complex in nature due to the greater diversity of the sources and the way it can be mentioned. The first problem is to realize that, while most people will, not all people will write their names in the format given above, and you need to accept it. What you can do is to accept the input in a string and then parse it and show the results to the user. If anything is wrong then he will correct it and you can continue any way. Google does this in the Gmail contacts. Point is that you should always give the user the choice of representing things the way he wants to. It will be a good idea to create a feedback system here, where in the corrections can tell the system of the kind of surnames people have. That is another issue, there is no point in storing all the names in the world. Because the number of surnames are lesser in number than the number of First or middle names, it is a good idea to store them. The same can be done for the Initials, Family Name and Village name. The maiden name in most cases is a previous surname and can be recognized with the same DB. Another good idea is to identify a full stop in your string. If the full stop has come after a short burst of characters right after a space or at the beginning of the name then it is an initial. But beware, there is a pitfall here. People write their surnames in shorthand as well at times, eg: Kumar can be written as Kr. The examples are endless. One needs to identify such challenges for such problems. And in my opinion it is not all that a difficult thing to do once you have done it. I am currently working on one such parser and once the code is ready and presentable, maybe i will publish it here. Until then this is a good place to use your algorithmic skills.