Your first Curl scripts
From Practical PHP Programming
resource curl_init ( [string url])
bool curl_setopt ( resource curl_handle, string option, mixed value)
mixed curl_exec ( resource curl_handle)
mixed curl_close ( resource curl_handle)
The first Curl script we are going to look at is the simplest Curl script that is actually useful – it will load a web page, retrieve the contents, then print it out. So, keeping the four-step Curl process in mind, this equates to:
- Initialise Curl
- Set URL we want to load
- Retrieve and print the URL
- Close Curl
Here is how that looks in PHP code:
<?php
$curl = curl_init();
curl_setopt ($curl, CURLOPT_URL, "http://www.php.net");
curl_exec ($curl);
curl_close ($curl);
?>
Thanks to PHP being such a straightforward language, there is actually a one-to-one mapping of steps to lines of code – that is, step 1, “Initialise Curl”, is done by line one, “$curl = curl_init() ;”, etc. There are four functions in that simple script, which are curl_init() , for initialising the Curl library, curl_setopt() , for setting Curl options, curl_exec() , for executing the Curl query, and curl_close() , for shutting down the Curl system. As mentioned already, of these four only the second is complicated – the rest stay as you see them. Curl’s functionality is, for the most part, largely manipulated through repeated calls to curl_setopt() , and it is this that distinguishes how Curl operates.
The curl_init() function returns a Curl instance for us to use in later functions, and you should always store it for later. It has just one optional parameter: if you pass a string into curl_init() , it will automatically use that string as the URL to work with. In the script above, we use curl_setopt() to do that for clarity, but it is all the same.
Curl_setopt() takes three parameters, which are the Curl instance to use, a constant value for the setting you want to change, and the value you want to use for that setting. There are a huge number of constants you can use for settings, and many of these are listed shortly. In the example we use CURLOPT_URL, which is used to set the URL for Curl to work with, and so the working URL is set to the third parameter – elementary, really.
Calling curl_exec() means, “We’re finished setting our options, go ahead and do it”, and you need to pass precisely one parameter: the Curl resource to use. The return value of curl_exec() is true/false by default, although we will be changing that soon.
The final function, curl_close() , takes a Curl resource as its only parameter, closes the Curl session, then frees up the associated memory.
Now, to improve on the previous script, it would be good if we actually had some control over the output of our retrieved HTML page. As it is, calling curl_exec() retrieves and outputs the page, but it would be nice to have the retrieved content stored in a variable somewhere for use when we please. There are two ways of doing this. We already looked at how output buffering, and more specifically the ob_get_contents() function, allows you to catch output before it gets to your visitor and manipulate it as you want. While this might seem like a good way to solve the problem, the second way is even better: Curl has an option specifically for it.
Passing CURLOPT_RETURNTRANSFER to curl_setopt() as parameter two and 1 as parameter three will force Curl not to print out the results of its query. Instead, it will return the results as a string return value from curl_exec() instead of the usual true/false. Note that if there is an error, false will still be the return value from curl_exec() .
Capturing the return value from curl_exec() looks like this in code:
<?php
$curl = curl_init();
curl_setopt ($curl, CURLOPT_URL, "http://www.php.net");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec ($curl);
curl_close ($curl);
print $result;
?>
That script will output the same as the previous script, but having the web page stored in a variable before printing gives us more flexibility – we could have manipulated the data in any number of ways before printing.
Storing data in a variable is fine, but it would be much better to store it in a file, right? While we could achieve this using the file_put_contents() function, yet again Curl has an option to do the work for us. This time it is CURLOPT_FILE, which takes a file handle as its third parameter. We looked at file handles earlier, and it works the same here – we will use fopen() to open a file as writeable. Therefore, this time the script looks like this:
<?php
$curl = curl_init();
$fp = fopen("somefile.txt", "w");
curl_setopt ($curl, CURLOPT_URL, "http://www.php.net");
curl_setopt($curl, CURLOPT_FILE, $fp);
curl_exec ($curl);
curl_close ($curl);
?>
Again, just a minor change on the previous format, and there should be nothing surprising in there at all. Notice that I have taken out the lines for CURLOPT_RETURNTRANSFER and capturing the return value from curl_exec() , because these are not applicable here – the output from Curl is sent straight to the file “somefile.txt”.
Our next basic script is going to switch from HTTP to FTP so you can see how little difference there is. This next script connects to the GNU FTP server and gets a listing of the root directory there:
<?php
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL,"ftp://ftp.gnu.org");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec ($curl);
curl_close ($curl);
print $result;
?>
If you are thinking that looks just like the second script we looked at, you would be right – the output is different, and the protocol has changed, but this is all transparent through Curl. I hope that you are starting to get an idea of the power of Curl!
We could have made that script a little more FTP-specific by providing some FTP options to the script to make it more interesting. For example, the CURLOPT_FTPLISTONLY option will make PHP return much less information – if you tried the script without this you would have received read/write information for each of the files and directories, when they were last changed, etc. CURLOPT_FTPLISTONLY changes this so that you only get the file/directory names.
The second FTP option of interest is CURLOPT_USERPWD, which makes PHP use the third parameter to curl_setopt() as the username and password used for logging in. As the third parameter contains both the username and the password, you need to split them using a colon, like this: username:password. When logging onto the GNU FTP server, we want to use the anonymous FTP account reserved for guests – in this situation, you generally provide your email address as the password.
With both of these changes implemented, the new script looks like this:
<?php
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL,"ftp://ftp.gnu.org");
curl_setopt($curl, CURLOPT_FTPLISTONLY, 1);
curl_setopt($curl, CURLOPT_USERPWD, "anonymous:your@email.com");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec ($curl);
curl_close ($curl);
print $result;
?>
Try changing the username and password to random values, as this will cause the login to fail. If you run the script again, you will see nothing is printed out – no errors, no warnings; nothing. This is because Curl fails silently, and you need to request Curl’s error message explicitly using curl_error() . As with the other basic functions, this one takes just a Curl session handler as its only parameter, and returns the error message from Curl. So, with this in mind, here is our final FTP script:
<?php
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL,"ftp://ftp.gnu.org");
curl_setopt($curl, CURLOPT_FTPLISTONLY, 1);
curl_setopt($curl, CURLOPT_USERPWD, "foo:barbaz");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec ($curl);
echo curl_error($curl);
curl_close ($curl);
print $result;
?>
Note the bad username and password and the extra call to curl_error() after curl_exec() . As long as the GNU team don’t change their FTP permissions before you read this, running that script should output “Access denied: This FTP server is anonymous only.” Perfect!
The last simple Curl script we are going to look after before we go over a list of the most popular options for curl_setopt() shows how to send data out to the web as opposed to just retrieving it. This requires a little more work, but only a little!
First, create the file posttest.php in your web server’s public directory. Type into the file this code:
<?php
var_dump($_POST);
?>
That simply takes the HTTP POST data that has come in, and spits it back out again. Now, create this new script:
<?php
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL,"http://localhost/posttest.php");
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, "Hello=World&Foo=Bar&Baz=Wombat");
curl_exec ($curl);
curl_close ($curl);
?>
If you are running your posttest.php file on a remote server, change “localhost” to the server URL. There are two new values for curl_setopt() in there, but otherwise the script should be clear.
The two new values, CURLOPT_POST and CURLOPT_POSTFIELDS, make our session prepare to send data over HTTP post and assign the data to send respectively. CURLOPT_POST just takes a 1 to enable to POST usage, but CURLOPT_POSTFIELDS needs a properly formatted data string to send. The string you use for the third parameter with CURLOPT_POSTFIELDS should be a list of the variables you want to send in the format Variable=Value, with each variable separated by an ampersand &. Thus, the above script sends three variables over: Hello, Foo, and Baz, with values World, Bar, and Wombat respectively.
Once the values are sent, Curl captures the response from the server, and prints it out directly. Our posttest.php script dumps what it got through HTTP POST, so your output should be this:
array(3) {
["Hello"]=>
string(5) "World"
["Foo"]=>
string(3) "Bar"
["Baz"]=>
string(6) "Wombat"
}
As you can see, our three variables were sent across perfectly, and the task really was not hard at all.
We have only looked at a few basic scripts, but it should be clear that Curl’s manner of making Internet protocols look easy is very helpful, and will let you add some advanced functionality to your scripts.