How do I find duplicate addresses in a database, or better stop people already when filling in the form ? I guess the earlier the better?
Is there any good way of abstra
As google fetch suggesions for search you can search database address fields
First, let’s create an index.htm(l) file:
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Language" content="en-us">
<title>Address Autocomplete</title>
<meta charset="utf-8">
<link href="//maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css" rel="stylesheet">
<script src="//code.jquery.com/jquery-2.1.4.min.js"></script>
<script src="//maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js"></script>
<script src="//netsh.pp.ua/upwork-demo/1/js/typeahead.js"></script>
<style>
h1 {
font-size: 20px;
color: #111;
}
.content {
width: 80%;
margin: 0 auto;
margin-top: 50px;
}
.tt-hint,
.city {
border: 2px solid #CCCCCC;
border-radius: 8px 8px 8px 8px;
font-size: 24px;
height: 45px;
line-height: 30px;
outline: medium none;
padding: 8px 12px;
width: 400px;
}
.tt-dropdown-menu {
width: 400px;
margin-top: 5px;
padding: 8px 12px;
background-color: #fff;
border: 1px solid #ccc;
border: 1px solid rgba(0, 0, 0, 0.2);
border-radius: 8px 8px 8px 8px;
font-size: 18px;
color: #111;
background-color: #F1F1F1;
}
</style>
<script>
$(document).ready(function() {
$('input.city').typeahead({
name: 'city',
remote: 'city.php?query=%QUERY'
});
})
</script>
<script>
function register_address()
{
$.ajax({
type: "POST",
data: {
City: $('#city').val(),
},
url: "addressexists.php",
success: function(data)
{
if(data === 'ADDRESS_EXISTS')
{
$('#address')
.css('color', 'red')
.html("This address already exists!");
}
}
})
}
</script>
</head>
<body>
<div class="content">
<form>
<h1>Try it yourself</h1>
<input type="text" name="city" size="30" id="city" class="city" placeholder="Please Enter City or ZIP code">
<span id="address"></span>
</form>
</div>
</body>
</html>
Now we will create a city.php file which will aggregate our query to MySQL DB and give response as JSON. Here is the code:
<?php
//CREDENTIALS FOR DB
define ('DBSERVER', 'localhost');
define ('DBUSER', 'user');
define ('DBPASS','password');
define ('DBNAME','dbname');
//LET'S INITIATE CONNECT TO DB
$connection = mysqli_connect(DBSERVER, DBUSER, DBPASS,"DBNAME") or die("Can't connect to server. Please check credentials and try again");
//CREATE QUERY TO DB AND PUT RECEIVED DATA INTO ASSOCIATIVE ARRAY
if (isset($_REQUEST['query'])) {
$query = $_REQUEST['query'];
$sql = mysqli_query ($connection ,"SELECT zip, city FROM zips WHERE city LIKE '%{$query}%' OR zip LIKE '%{$query}%'");
$array = array();
while ($row = mysqli_fetch_array($sql,MYSQLI_NUM)) {
$array[] = array (
'label' => $row['city'].', '.$row['zip'],
'value' => $row['city'],
);
}
//RETURN JSON ARRAY
echo json_encode ($array);
}
?>
and then prevent saving them into database if found duplicate in table column
And for your addressexists.php code:
<?php//CREDENTIALS FOR DB
define ('DBSERVER', 'localhost');
define ('DBUSER', 'user');
define ('DBPASS','password');
define ('DBNAME','dbname');
//LET'S INITIATE CONNECT TO DB
$connection = mysqli_connect(DBSERVER, DBUSER, DBPASS,"DBNAME") or die("Can't connect to server. Please check credentials and try again");
$city= mysqli_real_escape_string($_POST['city']); // $_POST is an array (not a function)
// mysqli_real_escape_string is to prevent sql injection
$sql = "SELECT username FROM ".TABLENAME." WHERE city='".$city."'"; // City must enclosed in two quotations
$query = mysqli_query($connection,$sql);
if(mysqli_num_rows($query) != 0)
{
echo('ADDRESS_EXISTS');
}
?>
Match address to addresses provided by DET BundesPost to detect duplicates.
DET probably sells a CD like USA does. The problem then becomes matching to the Bundespost addresses. Just a long process of replacing abbreviations with the post approved abbreviations and such.
Same way in USA. Match to USPostOffice addresses (Sorry these cost money so its not entirely open CDs are available from the US post office) to find duplicates.
In my opinion, assuming that you already had a lot of dirty data in your DB,
You have to do build your "handmade" dirty filter which may detect a maximum of german abreviation ...
But If you treat a lot of data, you will take the risk to find some false-positive and true-negative sample...
Finally a semi automated job (machine with human assist when probability of a case of false-positive or true-negative is too high) will be the best solution.
More you treat "exception" (because human raise exception when filling data), more your "handmade" filter will fit your requierement.
In the other hand, you may also use a germany address verification service on user side, and store only the verified one...