Loading

Español  English

Blog

By Enrique González

Getting images for a database using Google Search Engine

Sometimes happens that we have found an excellent database of products for an online store, restaurant directory, etc. and we want to use it to make a website or a mobile app, but we don’t have pictures of each restaurant or product, or we have the pictures but they have a watermark, or we can’t use them for some reason. Let's get them from Google Images.

We log in our Google account, and then create an account at Google Custom Search.
Once created, we click on "new search engine". Give it a name, select language, and where it says "Sites to search" write any scheme, no matter because it will not be used; for example "* .website.com".
Once created, click on "Show on the Web - public URL". In the browser you will see a URL like:
https://www.google.com/cse/publicurl?cx=0123456789:xug12345sgzj9 
Write down the "cx" parameter value for later. Now select "Modify your search engine - Control Panel." In the options page, go to "Image Search", select "Yes", so it will search in Google Images. In "Sites to search”, select "Search entire web but emphasize included sites." Once done, you must create an account in Google Developers Console, if we don't have one.

Create a new project in “Create project" and call it for example "Finding images". Go to “APIs & Auth” to "APIs", and delete all default APIs (Big Data, etc.), then switch to ON the "Custom Search API”. Now go to "Credentials", and select "Create new key". In the dialog that shows, we chose "Server Key", click on "Create" (without writing anything in the edit box), and we have created an API key, write down it for later.

We have the keys and ids for using in the image search. The images are obtained via GET, with a number of parameters that act as filters. In the site Google Custom Search we can see all the available parameters. The most important are:
  • q: This is the most important parameter, the words we will search. It needs to be in URL format
  • key: The key from Google Developers Console
  • cx: The parameter we found in the URL of our search engine
  • lr: The language, i.e. "lang_en"
  • cr: The country, i.e. "countryUK"
  • imgType: type of image, we write "photo" to search for pictures
  • imgSize: Size of the images; to avoid icons or small images, we write "medium", "large", etc.
  • googlehost: It allows us to restrict the search to a localized Google, for example google.co.uk
  • num: Number of images to return
So the search URL will be like this:

https://www.googleapis.com/customsearch/v1?googlehost=google.co.uk&lr=lang_en&imgType=photo&imgSize=medium&num=1&cr=countryUK&searchType=image&key=AIzaabcdefghijklmnopqrstuvwxyz1234&cx=123456789:w5tfght45rt6y7u&q=keyword+to+search
This give us a JSON with the result, that we can parse with PHP. Let's make a script to do a query on the database, get the name or the field that we will use to search for images from each row, and it will save the name of the image that gives us the Google search in the row in the field "image". Using this method we will have all the rows with the "image" field updated with the name of an actual image.
The script could be improved, for example by recording multiple images of each row in a related table, or renaming the images to not keep the original file names obtained via Google. In addition to updating the database, the script will return a list of URLs of the returned images, list that we can process easily with any download manager like JDownloader, to download them all.

$link=mysql_connect("HOST", "USER", "PASSWORD"); 
$db_selected = mysql_select_db('DATABASE', $link); 
$res = mysql_query("select id,name from table limit 100"); 
while ($row = mysql_fetch_array($res)) { 
if ($row['name']) { 
$term=urlencode(charnochars($row['name']).", London"); 
$ch = curl_init(); 
$timeout = 0;  
curl_setopt ($ch, CURLOPT_URL, "https://www.googleapis.com/customsearch/v1?googlehost=google.co.uk&lr=lang_en&imgType=photo&imgSize=large&num=1&cr=countryUK&searchType=image&key=MY_API_KEY&cx=MY_ID&q=".$term); 
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout); 
$content = curl_exec($ch); 
curl_close($ch); 
$ch=null; 
$json=json_decode($content); 
if ($json->items[0]->link) { 
    $bits = explode('/', $json->items[0]->link);  
    $sql = "UPDATE table SET imagen ='".$bits[count($bits)-1]."' WHERE id =".$row['id']; 
    mysql_query($sql); 
    echo $json->items[0]->link."<br />"; 


 sleep(1); // we pause the script 1 second, you can delete this 

mysql_free_result($res);  
  
function charnochars($search) { 
$chars = array("À" => "A","Â" => "A","Ä" => "A","Æ" => "AE","È" => "E","Ê" => "E","Ì" => "I","Î" => "I","Ð" => "D","Ò" => "O","Ô" => "O","Ö" => "O","Ø" => "O", 
"Ú" => "U","Ü" => "U","à" => "a","â" => "a","ä" => "a","æ" => "ae","è" => "e","ê" => "e","ì" => "i","î" => "i","ð" => "o","ò" => "o","ô" => "o","ö" => "o", 
"ø" => "o","ú" => "u","ü" => "u","Á" => "A","Ã" => "A","Å" => "A","Ç" => "C","É" => "E","Ë" => "E","Í" => "I","Ï" => "I","Ñ" => "N","Ó" => "O","Õ" => "O", 
"Ù" => "U","Û" => "U","Ý" => "Y","ß" => "B","á" => "a","ã" => "a","å" => "a","ç" => "c","é" => "e","ë" => "e","í" => "i","ï" => "i","ñ" => "n","ó" => "o", 
"õ" => "o","ù" => "u","û" => "u","ý" => "y","ÿ" => "y"); 
return str_replace(array_keys($chars),$chars,$search); 
}
Some words about this script:

- I have limited the query to 100 results, because this service has this limitation:
"For CSE users, the API provides 100 search queries per day for free. If you need more, you may sign up for billing in the Cloud Console. Additional requests cost $5 per 1000 queries, up to 10k queries per day."
- I use the "charnochars" function to remove all special characters in keyword search. Although we use urlencode, Google does not like “ñ” and other special characters. No matter, for Google, it is the same “España" or "espana". Beware also of the script encoding (utf-8, iso latin, etc.), which can make this function to work incorrectly.

- I append the city or the province to the keyword: If our search is very localized, this will give us more accurate photos, because if for example I search photos of the city of London, if I search "Reynolds Bar", this bar will be in a lot of cities in UK but if I write "Bar Reynolds, London", it will be more accurate.

- In the final list we will have pictures that have nothing to do with what we wanted, it's a risk we take. The photos will be very different in dimensions, a Photoshop action will help us with this.
  • Date29-05-2013
  • 0

    comments

  • +

    Leave a comment