ALL :: ZONES :: Copying using eregi |
|||
| By: jimmbob |
Date: 01/03/2003 00:00:00 |
Points: 50 | Status: Answered Quality : Excellent |
|
I have posted this question before but to no avail, so I decided to have another go, Hopefull somebody knows the answer. I have been trying with no success to create a script that will copy content from another website ( I have permission from the webmaster ), The problem is I want to import the listings to my SQL database and to do it by hand would take many hours, the link that I want to download the listings from is <A HREF="http://www.ireland.ie/thingstodo/searchbylinkloc.asp?categoryCode=03&subCategoryCode=111&subjectCode=&categoryName=Restaurants&subsection=dining&subsubsection=restaurants&locationString=&locationType=on&searchLocation=Cork+County&searchMonth=&submit=+Search+%3E%3E+">http://www.ireland.ie/thingstodo/searchbylinkloc.asp?categoryCode=03&subCategoryCode=111&subjectCode=&categoryName=Restaurants&subsection=dining&subsubsection=restaurants&locationString=&locationType=on&searchLocation=Cork+County&searchMonth=&submit=+Search+%3E%3E+</a> , Here is the code that I'm using but it does not give me any result, I suspect that the first couple of lines are okay but after that the problem could lie anywhere. Any help would be much appreciated Many Thanks. //Start Code <? $fp=@fopen("<A HREF="http://www.ireland.ie/thingstodo/searchbylinkloc.asp?categoryCode=03&subCategoryCode=111&subjectCode=&categoryName=Restaurants&subsection=dining&subsubsection=restaurants&locationString=&locationType=on&searchLocation=Cork+County&searchMonth=&submit=+Search+%3E%3E+">http://www.ireland.ie/thingstodo/searchbylinkloc.asp?categoryCode=03&subCategoryCode=111&subjectCode=&categoryName=Restaurants&subsection=dining&subsubsection=restaurants&locationString=&locationType=on&searchLocation=Cork+County&searchMonth=&submit=+Search+%3E%3E+</a>", "r") or die("Die Message."); $read=fread($fp, 15000); fclose($fp); $search = eregi("search-display.asp?InformationCode=(.*)'>", $read, $printing); for($i=0;$i<count($printing);$i++) { $fp = @fopen("<A HREF="http://www.ireland.ie/thingstodo/search-display.asp?InformationCode=">http://www.ireland.ie/thingstodo/search-display.asp?InformationCode=</a>".$printing[$i], "r") or die("Die Message"); $read = fread($fp,15000); $location = eregi("<b>Location</b></font></td><td align='left' valign='top' ><font face='verdana' size='2'>(.*)</font></td>",$read,$printing); $telephone = eregi("<b>Phone</b></font></td><td align='left' valign='top' ><font face='verdana' size='2'>(.*)</font></td>",$read,$printing); $address = eregi("<b>Address</b></font></td><td align='left' valign='top' ><font face='verdana' size='2'>(.*)</font></td>",$read,$printing); $username = "root"; $password = ""; $hostname = "loclahost"; $db = mysql_connect($hostname, $username, $password) or die("Unable to connect to MySQL"); $selectdb = mysql_select_db("cork",$db) or die("Could not select first_test"); if (mysql_query("insert into table values($address,$location,$telephone)")) { print "successfully inserted record"; } else { print "Failed to insert record"; } mysql_close($db); } ?> //End Code |
|||
| By: VGR | Date: 01/03/2003 19:01:00 | Type : Answer |
|
| I do this differently, in cas you are interested in a working (and debuggable, unlike your eregi() stuff :D :D ) and different solution. Have a look at this : very simplified version (I use more complicated parsing) <? // initialisations // outer ones $filename='<A HREF="http://www.pricewatch.com/menus/m37.htm">http://www.pricewatch.com/menus/m37.htm</a>'; // inner ones $products=array(); // explicit $k=0; // number of products found // URI access $fd = @fopen ($filename, "r"); if ($fd) { // si page trouvée while (!feof ($fd)) { $ligne= fgets($fd, 4096); // here you can make some parsing on-the-fly, for example to stop on "RADEAON 9700 PRO" $contents []=$ligne; } // while lecture bloquante // $contents = fread ($fd, filesize ($filename)); non blocking : doesn't work properly fclose ($fd); // here call parsing for products. Analyse($contents,$products,$k); } else { // page not found // infos emptied $k=0; // here logging or email alert for invalid page access } // you've your $k products, proceed with displaying or searching for $products[1..$k]["name"]=="RADEON 9700 PRO" echo "$k products found, here they are : "; for ($i=1;$i<=$k;$i++) echo "product ".$products[$i]["name"]." at price ".$products[$i]["price"]." "; echo "finished."; // define somewhere this function : function Analyse($contents,&$products,&$k) { // here GLOBALS if need be $i=0; // n° de la ligne courante dans $contents[] $j=count($contents); $yy=0; // position in line (remember HTML lines may be loooong) while ($i<$j) { // while not finished unfructuously while ((strpos($contents[$i],'<tr><td>')===false) and ($i<$j)) $i++; if ($i<>$j) { // found a data block, else finished // filling in $deb='<tr><td>'; // constant introduced for extensiveness $fin='</td>'; // idem $ligne=substr($contents[$i],$yy); // rest of line after previous processing while (($i<=$j) and (($m=strpos($ligne,$deb))===false)) { $i++; $yy=0; $ligne=$contents[$i]; } // skip lines until block found or end of data if ($i<=$j) { // found a product, and its position is in $m $k++; // increments #products found $m=$m+strlen($deb); $n=$m; $locRes=''; $l=strlen($fin); while (($n<strlen($ligne))and((substr($ligne,$n,$l))<>$fin)) $n++; // linear search for end marker if ($n==strlen($ligne)) { $locRes=substr($ligne,$m); $i++; $yy=0; $n=strpos($contents[$i],$fin); $m=0; $ligne=$contents[$i]; } if (!($n===false)) { $locRes.=substr($ligne,$m,$n-$m); $yy=$yy+$n+1; } else { $locRes.=''; $yy=0; } $products[$k]["price"]=$locRes; $deb='">'; $fin='</A'; // note case $locRes=""; $ligne=substr($contents[$i],$yy); // rest of line after previous processing while (($i<=$j) and (($m=strpos($ligne,'ID='))===false)) { $i++; $yy=0; $ligne=$contents[$i]; } // special if (! ($m===false)) { // found a product, and its position is in $m $l=strlen($deb); while (substr($ligne,$m,$l)<>$deb) $m++; $m=$m+strlen($deb); $n=$m; $l=strlen($fin); $locRes=''; while (($n<strlen($ligne))and((substr($ligne,$n,$l))<>$fin)) $n++; // linear search for end marker if ($n==strlen($ligne)) { $locRes=substr($ligne,$m); $i++; $yy=0; $n=strpos($contents[$i],$fin); $m=0; $ligne=$contents[$i]; } if (!($n===false)) { $locRes.=substr($ligne,$m,$n-$m); $yy=$yy+$n+1; } else { $locRes.=''; $yy=0; } } // if found second tagged data $products[$k]["name"]=$locRes; } // if found product } // if found a new data block (product line) or end-of-data marker // else finished } // while not finished entirely } // Analyse Procedure // Nota Bene : all the ugly parsing I do is usually handled via a function more "intelligent" called GetChunk ;-) ?> |
|||
| By: KC_Speedball | Date: 01/03/2003 19:06:00 | Type : Comment |
|
| i can't really solve your problem, but i can tell you what you problem is. All you get from the page with fopen is following Microsoft JScript runtime error '800a138f' 'Request.ServerVariables(...).item' is null or not an object D:\INETPUB\WWWROOT\TBI\THINGSTODO\../include/header.asp, line 4 check if all variables in URL are right this good solve the problem...possibly |
|||
| By: KC_Speedball | Date: 01/03/2003 19:07:00 | Type : Comment |
|
| VGR is to fast for me....as usual :) REGARDS |
|||
| By: jimmbob | Date: 01/03/2003 19:40:00 | Type : Comment |
|
| Thanks for both of your comments, I don't think I can quite get my head around VGR's code, Thanks again. ....... I was trying to figure out the problem and here's the result of one of experiments, I'm hoping this makes it clearer for you !!Notice this code------------------------- if ($fp) { print"The file exists!"; } else { print"The file does not exist"; } !!End------------------------- <? $fp = @fopen("<A HREF="http://www.ireland.ie/thingstodo/searchbylinkloc.asp?categoryCode=03&subCategoryCode=111&subjectCode=&categoryName=Restaurants&subsection=dining&subsubsection=restaurants&locationString=&locationType=on&searchLocation=Cork+County&searchMonth=&submit=+Search+%3E%3E+">http://www.ireland.ie/thingstodo/searchbylinkloc.asp?categoryCode=03&subCategoryCode=111&subjectCode=&categoryName=Restaurants&subsection=dining&subsubsection=restaurants&locationString=&locationType=on&searchLocation=Cork+County&searchMonth=&submit=+Search+%3E%3E+</a>", "r") or die("Die Message."); //verifying if ($fp) { print"The file exists!"; } else { print"The file does not exist"; } //parsed text "the file exists" :-) $read = fread($fp, 15000); fclose($fp); $search = eregi("search-display.asp?InformationCode=(.*)'>", $read, $printing); for($i=0;$i<count($printing);$i++) { $fp = @fopen("<A HREF="http://www.ireland.ie/thingstodo/search-display.asp?InformationCode=">http://www.ireland.ie/thingstodo/search-display.asp?InformationCode=</a>".$printing[$i], "r") or die("Die Message"); //verifying if ($fp) { print"The file exists!"; } else { print"The file does not exist"; } //there was no print out for this code ?? $read = fread($fp,15000); //etc.. etc.. |
|||
| By: VGR | Date: 01/03/2003 20:03:00 | Type : Comment |
|
| basically my code differs from yours per : - I don't read in a single $read=fread() but via multiple fgets() enabling me - if I need - to parse line perline and stop reading once I found what I eventually searched - I parse via strpos() and incrementing a line counter, not by ugly and unreliable eregi() calls :D that's all |
|||
|
Do register to be able to answer |
|||
©2010 These pages are served without commercial sponsorship. (No popup ads, etc...). Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE.
Please DO link to this page!








