visitor (0 QPoints)
  • FR
  • EN
  • NL
  • DE
  • ES
315 experts, 1193 registered users, 1659 questions already answered
European Experts Exchange, the very best site for high-quality IT solutions

New Improved Search!

 


05/10/2011 1h30 : Steve Jobs is dead, the father of Apple ][ is gone, we are all orphaned.

ALL :: ZONES :: Copying using eregi


By: jimmbob U.S.A.  Date: 01/03/2003 00:00:00  English  Points: 50 Status: Answered
Quality : Excellent
I have posted this question before but to no avail, so I decided to have another go, Hopefull somebody knows the answer.

I have been trying with no success to create a script that will copy content from another website ( I have permission from the webmaster ),

The problem is I want to import the listings to my SQL database and to do it by hand would take many hours, the link that I want to download the listings from is <A HREF="http://www.ireland.ie/thingstodo/searchbylinkloc.asp?categoryCode=03&subCategoryCode=111&subjectCode=&categoryName=Restaurants&subsection=dining&subsubsection=restaurants&locationString=&locationType=on&searchLocation=Cork+County&searchMonth=&submit=+Search+%3E%3E+">http://www.ireland.ie/thingstodo/searchbylinkloc.asp?categoryCode=03&subCategoryCode=111&subjectCode=&categoryName=Restaurants&subsection=dining&subsubsection=restaurants&locationString=&locationType=on&searchLocation=Cork+County&searchMonth=&submit=+Search+%3E%3E+</a> ,

Here is the code that I'm using but it does not give me any result, I suspect that the first couple of lines are okay but after that the problem could lie anywhere.

Any help would be much appreciated

Many Thanks.

//Start Code
<?
$fp=@fopen("<A HREF="http://www.ireland.ie/thingstodo/searchbylinkloc.asp?categoryCode=03&subCategoryCode=111&subjectCode=&categoryName=Restaurants&subsection=dining&subsubsection=restaurants&locationString=&locationType=on&searchLocation=Cork+County&searchMonth=&submit=+Search+%3E%3E+">http://www.ireland.ie/thingstodo/searchbylinkloc.asp?categoryCode=03&subCategoryCode=111&subjectCode=&categoryName=Restaurants&subsection=dining&subsubsection=restaurants&locationString=&locationType=on&searchLocation=Cork+County&searchMonth=&submit=+Search+%3E%3E+</a>", "r") or die("Die
Message.");

$read=fread($fp, 15000);


fclose($fp);

$search = eregi("search-display.asp?InformationCode=(.*)'>", $read, $printing);

for($i=0;$i<count($printing);$i++)
{
$fp = @fopen("<A HREF="http://www.ireland.ie/thingstodo/search-display.asp?InformationCode=">http://www.ireland.ie/thingstodo/search-display.asp?InformationCode=</a>".$printing[$i], "r") or die("Die Message");

$read = fread($fp,15000);


$location = eregi("<b>Location</b></font></td><td align='left' valign='top' ><font face='verdana' size='2'>(.*)</font></td>",$read,$printing);

$telephone = eregi("<b>Phone</b></font></td><td align='left' valign='top' ><font face='verdana' size='2'>(.*)</font></td>",$read,$printing);

$address = eregi("<b>Address</b></font></td><td align='left' valign='top' ><font face='verdana' size='2'>(.*)</font></td>",$read,$printing);


$username = "root";
$password = "";
$hostname = "loclahost";
$db = mysql_connect($hostname, $username, $password)
or die("Unable to connect to MySQL");
$selectdb = mysql_select_db("cork",$db)
or die("Could not select first_test");
if (mysql_query("insert into table values($address,$location,$telephone)"))
{
print "successfully inserted record";
}
else {
print "Failed to insert record";
}

mysql_close($db);
}


?>
//End Code
By: VGR Date: 01/03/2003 19:01:00 English  Type : Answer
I do this differently, in cas you are interested in a working (and debuggable, unlike your eregi() stuff :D :D ) and different solution.

Have a look at this :

very simplified version (I use more complicated parsing)

<?
// initialisations
// outer ones
$filename='<A HREF="http://www.pricewatch.com/menus/m37.htm">http://www.pricewatch.com/menus/m37.htm</a>';
// inner ones
$products=array(); // explicit
$k=0; // number of products found
// URI access
$fd = @fopen ($filename, "r");
if ($fd) { // si page trouvée
while (!feof ($fd)) {
$ligne= fgets($fd, 4096);
// here you can make some parsing on-the-fly, for example to stop on "RADEAON 9700 PRO"
$contents []=$ligne;
} // while lecture bloquante
// $contents = fread ($fd, filesize ($filename)); non blocking : doesn't work properly
fclose ($fd);
// here call parsing for products.
Analyse($contents,$products,$k);
} else { // page not found
// infos emptied
$k=0;
// here logging or email alert for invalid page access
}
// you've your $k products, proceed with displaying or searching for $products[1..$k]["name"]=="RADEON 9700 PRO"
echo "$k products found, here they are :
";
for ($i=1;$i<=$k;$i++) echo "product ".$products[$i]["name"]." at price ".$products[$i]["price"]."
";
echo "finished.";

// define somewhere this function :
function Analyse($contents,&$products,&$k) {
// here GLOBALS if need be
$i=0; // n° de la ligne courante dans $contents[]
$j=count($contents);
$yy=0; // position in line (remember HTML lines may be loooong)
while ($i<$j) { // while not finished unfructuously
while ((strpos($contents[$i],'<tr><td>')===false) and ($i<$j)) $i++;
if ($i<>$j) { // found a data block, else finished
// filling in
$deb='<tr><td>'; // constant introduced for extensiveness
$fin='</td>'; // idem
$ligne=substr($contents[$i],$yy); // rest of line after previous processing
while (($i<=$j) and (($m=strpos($ligne,$deb))===false)) { $i++; $yy=0; $ligne=$contents[$i]; } // skip lines until block found or end of data
if ($i<=$j) { // found a product, and its position is in $m
$k++; // increments #products found
$m=$m+strlen($deb);
$n=$m;
$locRes='';
$l=strlen($fin);
while (($n<strlen($ligne))and((substr($ligne,$n,$l))<>$fin)) $n++; // linear search for end marker
if ($n==strlen($ligne)) { $locRes=substr($ligne,$m); $i++; $yy=0; $n=strpos($contents[$i],$fin); $m=0; $ligne=$contents[$i]; }
if (!($n===false)) {
$locRes.=substr($ligne,$m,$n-$m);
$yy=$yy+$n+1;
} else { $locRes.=''; $yy=0; }
$products[$k]["price"]=$locRes;
$deb='">'; $fin='</A'; // note case
$locRes="";
$ligne=substr($contents[$i],$yy); // rest of line after previous processing
while (($i<=$j) and (($m=strpos($ligne,'ID='))===false)) { $i++; $yy=0; $ligne=$contents[$i]; } // special
if (! ($m===false)) { // found a product, and its position is in $m
$l=strlen($deb);
while (substr($ligne,$m,$l)<>$deb) $m++;
$m=$m+strlen($deb);
$n=$m;
$l=strlen($fin);
$locRes='';
while (($n<strlen($ligne))and((substr($ligne,$n,$l))<>$fin)) $n++; // linear search for end marker
if ($n==strlen($ligne)) { $locRes=substr($ligne,$m); $i++; $yy=0; $n=strpos($contents[$i],$fin); $m=0; $ligne=$contents[$i]; }
if (!($n===false)) {
$locRes.=substr($ligne,$m,$n-$m);
$yy=$yy+$n+1;
} else { $locRes.=''; $yy=0; }
} // if found second tagged data
$products[$k]["name"]=$locRes;
} // if found product
} // if found a new data block (product line) or end-of-data marker
// else finished
} // while not finished entirely
} // Analyse Procedure
// Nota Bene : all the ugly parsing I do is usually handled via a function more "intelligent" called GetChunk ;-)
?>
By: KC_Speedball Date: 01/03/2003 19:06:00 English  Type : Comment
i can't really solve your problem, but i can tell you what you problem is. All you get from the page with fopen is following


Microsoft JScript runtime error '800a138f'

'Request.ServerVariables(...).item' is null or not an object

D:\INETPUB\WWWROOT\TBI\THINGSTODO\../include/header.asp, line 4


check if all variables in URL are right this good solve the problem...possibly
By: KC_Speedball Date: 01/03/2003 19:07:00 English  Type : Comment
VGR is to fast for me....as usual :)

REGARDS
By: jimmbob Date: 01/03/2003 19:40:00 English  Type : Comment
Thanks for both of your comments, I don't think I can quite get my head around VGR's code,

Thanks again.

.......

I was trying to figure out the problem and here's the result of one of experiments, I'm hoping this makes it clearer for you

!!Notice this code-------------------------
if ($fp)
{ print"The file exists!"; }
else
{ print"The file does not exist"; }
!!End-------------------------

<?
$fp = @fopen("<A HREF="http://www.ireland.ie/thingstodo/searchbylinkloc.asp?categoryCode=03&subCategoryCode=111&subjectCode=&categoryName=Restaurants&subsection=dining&subsubsection=restaurants&locationString=&locationType=on&searchLocation=Cork+County&searchMonth=&submit=+Search+%3E%3E+">http://www.ireland.ie/thingstodo/searchbylinkloc.asp?categoryCode=03&subCategoryCode=111&subjectCode=&categoryName=Restaurants&subsection=dining&subsubsection=restaurants&locationString=&locationType=on&searchLocation=Cork+County&searchMonth=&submit=+Search+%3E%3E+</a>", "r") or die("Die Message.");

//verifying
if ($fp)
{ print"The file exists!"; }
else
{ print"The file does not exist"; }
//parsed text "the file exists" :-)

$read = fread($fp, 15000);

fclose($fp);

$search = eregi("search-display.asp?InformationCode=(.*)'>", $read, $printing);

for($i=0;$i<count($printing);$i++)
{
$fp = @fopen("<A HREF="http://www.ireland.ie/thingstodo/search-display.asp?InformationCode=">http://www.ireland.ie/thingstodo/search-display.asp?InformationCode=</a>".$printing[$i], "r") or die("Die Message");

//verifying
if ($fp)
{ print"The file exists!"; }
else
{ print"The file does not exist"; }
//there was no print out for this code ??

$read = fread($fp,15000);
//etc.. etc..
By: VGR Date: 01/03/2003 20:03:00 English  Type : Comment
basically my code differs from yours per :
- I don't read in a single $read=fread() but via multiple fgets() enabling me - if I need - to parse line perline and stop reading once I found what I eventually searched
- I parse via strpos() and incrementing a line counter, not by ugly and unreliable eregi() calls :D

that's all

Do register to be able to answer

EContact
browser fav
page generated in 303.784850 milliseconds

Why Google AdSense ads ?

compteur
 Ranking-Hits PageRank for this page