Data Cleansing


Data Cleansing can be simply defined as making raw data useable. This is akin to extracting oil from natural gas and deposits, Diamonds from coal and so on. The mere process of extraction separates the required material and thereafter this must be refined (in case of oil) or polished and cut (in case of diamonds). Finally, these can be used in different applications with further downstream processes.

 In today’s world of Data cleansing -    Information is extracted from Data Lakes. Just as with oil or diamonds – we need to extract, refine and use it for multiple applications.

Data Cleansing is used to represent this process. Alternate names include Data Scrubbing, Data Purification, Data Cleaning etc.Ixsight Technologies has built strong IP to parse, standardize, validate, correct and populate data values from “Structured or Unstructured Raw Data” or Data Lakes. The key technologies used for this are Scrubbix™ and Deduplix™

 Examples of Structured Data are:

  • Customer Data (Demographics, Contact Data, Identity Information, Lifestyle information)
  • Product Data (Product Classification, Manufacturer details)
  • Location Data (Location hierarchy)

 Examples of Unstructured Data are:

  • Social Media Data ( text and images)
  • e-mail messages
  • Internet of Things Data (IoT Data)
  • Logfiles etc.

 Some business applications that Ixsight’s Data cleansing solutions resolves are:

  • Improving Data Quality for contacting customer or cross-selling products – to provide valid, standardized and enriched information – example – validation of contact data, Correction of Address – city, zip codes, Scoring of Address, verifying email address etc.
  • Profiling and Scoring information based on Data Quality
  • Enabling Data Migration from one system to another - mapping data from multiple systems, providing unique, standardized, enriched data in destination formats.
  • Enabling Address Parsing and geocoding to support Location Analytics
  • Extraction, Cleansing of information from reviews, emails – to understand customer behavior
  • Extraction, Cleansing of information from social media – for fraud scoring, profiling

 Data Cleansing technologies must have capacity to handle large volumes of data that multiply by the minute and come in varying formats and structures. This is called Big Data. Big Data includes bad data which needs to be cleansed to get better analytics and possibly save a lot of money.  Ixsight’s technologies are keeping pace with Big Data Cleansing requirements as this will make data ingestion, preparation and discovery easier.