Languages :: PHP :: php link checker |
|||
| By: noobie |
Date: 14/10/2004 00:00:00 |
Points: 500 | Status: Answered Quality : Excellent |
|
i need to run a link checker, and i have about 2000 links to check, how would i check so many links? i am trying to check if the links are active (and not showing a 404 error or something like that) i think a fopen() function is used for that but not sure.... any help is appreciated |
|||
| By: VGR | Date: 14/10/2004 07:35:00 | Type : Comment |
|
| easy 1) a loop for your X links, say $links[$i] is the current 2) access URI $links[$i] and check the result for 404 or anything else 3) memorize $i if wrong URI, else NOP 4) loop something like this : <? // inits $badlinks=0; $bad=array(); // loop through $links[] (beforehand filled in by you) for ($i=1;$i<count($links);$i++) { // try to access that link $isgood=CheckURI($links[$i]]); // memorize result if (! $isgood) $bad[]=$i; } // display bad links for ($i=1;$i<$badlinks;$i++) echo "bad link '".$links[$bad[$i]]."' (index=$i) "; // done function CheckURI($parurl) { // inits $result=TRUE; // try to get URI $filename = "$parurl"; $tobec=TRUE; $fd = @fopen ($filename, "r"); if ($fd) { // si page trouvée while ((!feof ($fd))and($tobec)) { $ligne= fgets($fd, 4096); if (!(strpos($ligne,'[404] Not Found')===false)) $tobec=FALSE; // stop as soon as this is encountered $contents []=$ligne; } // while lecture bloquante fclose ($fd); if ($tobec) { // file entirely read OK (note that we could stop after X first lines, the '404' message is not at the 345th line... // nothing, result is TRUE already // this block is in case you want to log anything like "last correct date where found the URI was OK" } else { // we stopped before the end : 404 found $result=FALSE; } } else { // page not found $result=FALSE; } // if page trouvée ou non return $result; } // CheckURI Boolean Function ?> |
|||
| By: noobie | Date: 14/10/2004 07:59:00 | Type : Comment |
|
| so how would this script work? what do i have to do? create a data file? |
|||
| By: Hatemben | Date: 14/10/2004 08:09:00 | Type : Comment |
|
| is your links in database or text file ? |
|||
| By: noobie | Date: 14/10/2004 08:36:00 | Type : Comment |
|
| well the links are in this format: filename.php?go=Download&id=1 ........ filename.php?go=Download&id=9999 first...(they skip numbers.) second..i want to generate the links (all of the id's are in a database) third...i want to check them if they are active (if they are returning 404 errors) thanks alot.. anyone that helps me complete this gets 500 points. |
|||
| By: VGR | Date: 14/10/2004 08:38:00 | Type : Comment |
|
| just do this at the begin of the script (not tested by the way) $links=array(); $links[]='<A HREF="http://www.netscape.com">http://www.netscape.com</A>'; $links[]='<A HREF="http://www.badlink.zob">http://www.badlink.zob</A>'; $links[]='<A HREF="http://www.experts-exchange.com">http://www.experts-exchange.com</A>'; and you'll see... you just have to get your links in an array called $links (how surprising :/ ) and test the script... :/ |
|||
| By: noobie | Date: 14/10/2004 08:40:00 | Type : Comment |
|
| wait so i have to do: $links=array(); $links[]='<A HREF="http://www.mydomain.com">http://www.mydomain.com</A>'; ? and it will list all of the links on the site? (there are many pages...for example filename.php?page=1-20) |
|||
| By: Morph007x2b | Date: 14/10/2004 08:54:00 | Type : Comment |
|
| Check this post out. <A HREF="http://www.experts-exchange.com/Web/Q_20145908.html">http://www.experts-exchange.com/Web/Q_20145908.html</A> |
|||
| By: VGR | Date: 14/10/2004 09:01:00 | Type : Comment |
|
| Well noobie, you wrote "i need to run a link checker, and i have about 2000 links to check, how would i check so many links?" so I supposed that you had this list of links :/ Don't you ? call this list $links[] and my code will become crystal clear ;-) In a word : yes, do <? $links=array(); $links[]='<A HREF="http://www.netscape.com">http://www.netscape.com</A>'; $links[]='<A HREF="http://www.badlink.zob">http://www.badlink.zob</A>'; $links[]='<A HREF="http://www.europeanexperts.org">http://www.europeanexperts.org</A>'; // inits $badlinks=0; $bad=array(); // loop through $links[] (beforehand filled in by you) for ($i=1;$i<count($links);$i++) { // try to access that link $isgood=CheckURI($links[$i]]); // memorize result if (! $isgood) $bad[]=$i; } // display bad links for ($i=1;$i<$badlinks;$i++) echo "bad link '".$links[$bad[$i]]."' (index=$i) "; // done function CheckURI($parurl) { // inits $result=TRUE; // try to get URI $filename = "$parurl"; $tobec=TRUE; $fd = @fopen ($filename, "r"); if ($fd) { // si page trouvie while ((!feof ($fd))and($tobec)) { $ligne= fgets($fd, 4096); if (!(strpos($ligne,'[404] Not Found')===false)) $tobec=FALSE; // stop as soon as this is encountered $contents []=$ligne; } // while lecture bloquante fclose ($fd); if ($tobec) { // file entirely read OK (note that we could stop after X first lines, the '404' message is not at the 345th line... // nothing, result is TRUE already // this block is in case you want to log anything like "last correct date where found the URI was OK" } else { // we stopped before the end : 404 found $result=FALSE; } } else { // page not found $result=FALSE; } // if page trouvie ou non return $result; } // CheckURI Boolean Function ?> I don't guarantee it typo-free or error-free, but it's 85% minimum what you'll need at the end. |
|||
| By: VGR | Date: 14/10/2004 09:10:00 | Type : Answer |
|
| OK, I TESTED IT AND IT WORKS I had some typos and minor errors (thigs forgotten) So now the code is <? $links=array(); $links[1]='<A HREF="http://www.netscape.com">http://www.netscape.com</A>'; $links[2]='<A HREF="http://www.badlink.zob">http://www.badlink.zob</A>'; $links[3]='<A HREF="http://www.europeanexperts.org">http://www.europeanexperts.org</A>'; //test $DEBUGTEST=1; if ($DEBUGTEST==1) echo count($links)." links in input "; // // inits $badlinks=0; $bad=array(); // loop through $links[] (beforehand filled in by you) for ($i=1;$i<=count($links);$i++) { // try to access that link $isgood=CheckURI($links[$i]); if ($DEBUGTEST==1) echo "link $i '".$links[$i]."' is ".(($isgood)?'OK':'KO')." "; // memorize result if (! $isgood) $bad[]=$i; } // display bad links $badlinks=count($bad); //test if ($DEBUGTEST==1) echo "$badlinks bad links found "; // for ($i=0;$i<$badlinks;$i++) echo "bad link '".$links[$bad[$i]]."' (index=$i) "; // done function CheckURI($parurl) { // inits $result=TRUE; // try to get URI $filename = "$parurl"; $tobec=TRUE; $fd = @fopen ($filename, "r"); if ($fd) { // si page trouvie while ((!feof ($fd))and($tobec)) { $ligne= fgets($fd, 4096); if (!(strpos($ligne,'[404] Not Found')===false)) $tobec=FALSE; // stop as soon as this is encountered $contents []=$ligne; } // while lecture bloquante fclose ($fd); if ($tobec) { // file entirely read OK (note that we could stop after X first lines, the '404' message is not at the 345th line... // nothing, result is TRUE already // this block is in case you want to log anything like "last correct date where found the URI was OK" } else { // we stopped before the end : 404 found $result=FALSE; } } else { // page not found $result=FALSE; } // if page trouvie ou non return $result; } // CheckURI Boolean Function ?> and it produces (correctly) : 3 links in input link 1 '<A HREF="http://www.netscape.com">http://www.netscape.com</A>' is OK link 2 '<A HREF="http://www.badlink.zob">http://www.badlink.zob</A>' is KO link 3 '<A HREF="http://www.europeanexperts.org">http://www.europeanexperts.org</A>' is OK 1 bad links found bad link '<A HREF="http://www.badlink.zob">http://www.badlink.zob</A>' (index=0) Just set $DEBUGTEST=0 and your code will behave as expected by you. |
|||
| By: noobie | Date: 14/10/2004 09:29:00 | Type : Comment |
|
| the script works, but i want to check all of the link that are associated with the site... if i put in yahoo.com, i want it to check the entire site map of it! all of the links the page is linked to and all of the pages the linked site is linked to later. |
|||
| By: VGR | Date: 14/10/2004 10:39:00 | Type : Comment |
|
| that's not at all what was your original question about... ... anyway, it's feasible (same CheckURI calls), but after having reda the page and CheckURI-ed all links encountered in it I let you build this loop, given it's a different question. I even suggest you ask a new question, because I fairly answered your original one. I would do this : -for each URL in the original sites' list -check it using technique above, BUT -modify checkURI so that it recursively checks all encountered URIs in the currently-being-checked page -you have to provide an external constant "maximum depth" to stop the recursion -you have to parse the $contents[] array for tags : A HREF, IMG, FORM ACTION= etc it's a lot of work, and build a local array, then loop through it and call the same function again recursively feasible but time-consuming if you go deeper than first level (ie, verify sites and immediate links, not the links of linked pages) |
|||
| By: Morph007x2b | Date: 14/10/2004 10:42:00 | Type : Comment |
|
| You could try one of those Free Link Harvestors :) Search google <A HREF="http://www.google.com/search?q=Link+Harvestor">http://www.google.com/search?q=Link+Harvestor</A> |
|||
|
Do register to be able to answer |
|||
©2010 These pages are served without commercial sponsorship. (No popup ads, etc...). Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE.
Please DO link to this page!








