Continued from page 1
Tip: Pipe your cron jobs to /dev/null if you aren't doing anything with output, because some hosts e-mail you results and no one needs an extra piece of useless e-mail every day.
Just change http://www.google.com to page of your choice. However it's important to know that "archive" you're taking will only be a snapshot of that page on a particular day.
What I mean by that is, if you're archiving a blog page every day, this archiver won't archive that page on a particular day, it'll just be archiving what was there at that time. So it's not useful for everything, but it's good if you have access to a page that changes constantly, once a day, whose results you'd like to store.
Add that line above into your crontab file. These days every host has a control panel so there should be a place in there to add cron jobs. If you'd like archiver to run at a time other than midnight, or if it should run weekly, monthly, or whatever, try this tool I've made for you:
http://www.robertplank.com/cron
I've designed it same way Task Scheduler is setup, you can enter a certain time, run only on weekdays, run only on certain days of week. Anything you want.
This tip doesn't take care of everything... for example, wget won't save images on a page unless they're referenced by full URLs. In next installment of this article series I'll be showing you how you can use PHP to make up for some of things wget can't do (like grabbing images).
Here's my solution: http://www.jumpx.com utorials/commandline/get.zip
It's not most perfect script in world, but it should do what you want most of time. If you'd like to delve into what it does, I've added comments within so you can see what it does. I've commented all functions and a few of important parts of code.
ARGUMENTS (NOT THE SHOUTING KIND)
But wait, you want to use it in a crontab, which is run from command line. You can't just do something like:
php get.php?url=http://www.google.com
Because it'll try looking for a *file* named all that, complete with question mark and all. So what if you have ten different URLs to grab off ten different crontabs, but you only want one script.
How would you do all that? It's a long brutal ordeal so prepare yourself. Ready?
php get.php url=http://www.google.com
Yeah, that's all there is to it. PHP's pretty cool like that, it takes arguments after file name and stores them in same array you'd check anyway.
One thing you might notice is that every time you run PHP from command line, it gives you something like this:
Content-type: text/html X-Powered-By: PHP/4.3.3
your output here...
Those first couple of lines are HTTP headers. But we're not using HTTP (not loading it from a browser), so in command line it's better to call php with "-q" option, like this:
php -q get.php url=http://www.google.com
The "q" stands for quiet, and will refrain from giving you HTTP headers. If you're just piping script to /dev/null (to nothing) in a crontab, it doesn't really make a difference but you should try to make this a habit when running PHP from command line.
That's enough for you to at least get started. If you still feel liking poking about with things PHP can do in command line, you can try prompting a user for keyboard input, like this:
echo "Give me your name: "; $data = fopen("php://stdin", "rb");
while (1==1) { $chunk = fread($data, 1); if ($chunk == " " || $chunk == " ") break; $input .= $chunk; } fclose($data);
echo "Hello $input! ";
?>
Remember, that only works when PHP is run from shell.
If you have PHP installed in Windows on a local machine of yours, you can also see what happens when you try to read (and write) to filehandles like "COM1:" and "LPT1:" ... yep, you guessed it, serial port and printer port. If PHP isn't installed on computer you're using now then don't bother. But it is possible to use PHP to print and interact with your peripherals as well.
You're welcome.
Robert Plank is the creator of Lightning Track, Redirect Pro, Rotatorblaze, and others.
An easy way to display the content saved by this article's script is explained in chapters 15 and 16 of his book, "Simple PHP": http://www.simplephp.com
You may reprint this article in full in your newsletter or web site.