Skip to content

This repository contains a Python notebook that automates the process of matching district names between two datasets: people_of_india_clean_2014.csv and minority_conc_census_2011.csv. The matching is performed using n-grams and Jaccard similarity to compare district names and identify the most similar pairs.

Notifications You must be signed in to change notification settings

bishmaybarik/ngram-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

District Name Matching using N-grams and Jaccard Similarity

This repository contains a Python notebook that automates the process of matching district names between two datasets: people_of_india_clean_2014.csv and minority_conc_census_2011.csv. The matching is performed using n-grams and Jaccard similarity to compare district names and identify the most similar pairs.

About

This repository contains a Python notebook that automates the process of matching district names between two datasets: people_of_india_clean_2014.csv and minority_conc_census_2011.csv. The matching is performed using n-grams and Jaccard similarity to compare district names and identify the most similar pairs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published