visitor (0 QPoints)
  • FR
  • EN
  • NL
  • DE
  • ES
315 experts, 1193 registered users, 1659 questions already answered
European Experts Exchange, the very best site for high-quality IT solutions

New Improved Search!

 


05/10/2011 1h30 : Steve Jobs is dead, the father of Apple ][ is gone, we are all orphaned.

Languages :: PHP :: Grabbing data from an external website and parsing it..


By: PHP newbee U.S.A.  Date: 07/02/2003 00:00:00  English  Points: 50 Status: Answered
Quality : Excellent
I'm trying to grab an XML from an online game site and parse the data. (I'm doing this more for the experience than any particular desire to duplicate what the game has already duplicated..)

Here's what I've got..

$GrabURL = "<A HREF="http://www.anarchy-online.com/character/bio/d/1/name/$name/bio.xml">http://www.anarchy-online.com/character/bio/d/1/name/$name/bio.xml</a>";
$GrabLevelStart = '<level>';
$GrabLevelEnd = '</level>';
$GrabProfessionStart = '<profession>';
$GrabProfessionEnd = '</profession>';

$OpenFile = fopen("$GrabURL", "r");
$RetrieveFile = fread($OpenFile, 200000);
$GrabLevelData = eregi("$GrabLevelStart(.*)$GrabLevelEnd", $RetrieveFile, $DataPrint);
fclose($OpenFile);
$level = $DataPrint[1];

$OpenFile = fopen("$GrabURL", "r");
$RetrieveFile = fread($OpenFile, 200000);
$GrabProfessionData = eregi("$GrabProfessionStart(.*)$GrabProfessionEnd", $RetrieveFile, $DataPrint);
fclose($OpenFile);
$profession = $DataPrint[1];




Note that I'm forced to open and close the file twice..

So, my question to you is, how would I go about not having to open and close it? Is there a quicker, easier way to re-search the file without all the added overhead? The above is just a snippet, in reality I'm doing the open/close bit a good dozen times to parse out all of the desired fields.

I'm new to PHP, so I'm open to suggestions on better ways to achieve this goal.. My heart isn't set on the example above, that's just the best I could work up so far.

If you need a $name so you can examine the XML from the website, use "burchenal" as the name... it'll give you a valid XML to look at.

<A HREF="http://www.anarchy-online.com/character/bio/d/1/name/burchenal/bio.xml">http://www.anarchy-online.com/character/bio/d/1/name/burchenal/bio.xml</a>

Thanks.
By: VGR Date: 07/02/2003 08:27:00 English  Type : Comment
if you do
$this=array();
$this=readfile($GrabURL); // fopen wrappers being ON

you'll end up with $this[], an array in memory that you can parse and analyse the way you want.

Exit the problem of the double reading.

personally, I prefer doing :
fopen();
while not EOF {
fread($ligne);
$contents[]=$ligne;
}
fclose() so that I may parse line by line while I memorize the file, but you do like you want 8-)
By: TheFalklands Date: 08/02/2003 04:23:00 English  Type : Comment
PHPnewbee,

Try this:

<? php
$source=""<A HREF="http://www.anarchy-online.com/character/bio/d/1/name/$name/bio.xml">http://www.anarchy-online.com/character/bio/d/1/name/$name/bio.xml</a>";
$Tag="level" // or what ever tag you are looking for

$FileContent=file($source);
$XMLString=implode('',$FileContent);

$Result=strpos($XMLString,"<".$TagName.">");
if($Result!==false)
{
$TagData=Substr($XMLString,$Result+strlen("<".$TagName.">"));
$TagData=Substr($TagData,0,strpos($Data,"<"));
}
?>

If you plan parsing many fields from this url, you should consider making a function, say 'gettagvalue($TagName)' passing the name of the tag you are looking for and so you can loop it or at least save a bit typing:).

Best of luck,
TFL

By: PHP newbee Date: 10/02/2003 05:11:00 English  Type : Comment
Just a quick note..

Thanks for the info. I'll look at both suggestions and get back to you. Work's a little busy today, but I'll try to make time.

Just wanted to let you know I haven't forgotten about ya'll. :)
By: PHP newbee Date: 19/02/2003 05:22:00 English  Type : Comment
Gentlemen,

Thanks for the info.

Perhaps my understanding of PHP is not adequate, but neither option seems to do what I seek.

VGR: I don't see how that'll allow me to parse something and return the value multiple times.. opening the file for parsing doesn't seem to be the problem. The problem is the parsing itself.. reading the string and returning the value multiple times. As it currently stands, the returned value is dataprint[1] rather than something unique that'll allow me to set it into a string value.

TheFalklands: I'm getting a error on line two from your code. At first I figured it might be the double quote marks on the url, but that didn't fix it. Then I figured it may be the lack of semicolon on the level string set, but that also failed to fix it. Still, even assuming I was able to get it working, I'm not fully understanding what your code really does or how I could apply it to my situation.

Any other ideas/clarifications?

Thank you both for your help.
By: VGR Date: 24/02/2003 22:53:00 English  Type : Answer
that's because you "parse" using eregi() or other RegExp functions...

I just did some simple parser for someone else, using only standard string processing functions, and it works...

You should change your way of parsing, if parsing doesn't work and you can't know why 8-))

Just for your eyes :

<?
// initialisations
// outer ones
$filename='<A HREF="http://www.pricewatch.com/menus/m37.htm">http://www.pricewatch.com/menus/m37.htm</a>'; // URL of the page to get and parse
// inner ones
$products=array(); // explicit, this is your array to fill
$k=0; // number of products found
// URI access
$fd = @fopen ($filename, "r");
if ($fd) { // si page trouvie
while (!feof ($fd)) {
$ligne= fgets($fd, 4096);
// here you can make some parsing on-the-fly, for example to stop on "RADEON 9700 PRO"
$contents []=$ligne;
} // while lecture bloquante
// $contents = fread ($fd, filesize ($filename)); non blocking : doesn't work properly
fclose ($fd);
// here call parsing for products.
Analyse($contents,$products,$k);
} else { // page not found
// infos emptied
$k=0;
// here logging or email alert for invalid page access
}
// you've your $k products, proceed with displaying or searching for $products[1..$k]["name"]=="RADEON 9700 PRO"


// here some DB stuff to store data
echo "update begins.
";
// CHECK THOSE VALUES !!
$dbLogin = 'bshaq34';
$dbPassword = 'xxxxx';
$dbName = 'xxxx';
$dbHost = 'localhost';
//EoCheckToDo
<snip>
echo "update finished.
";

// define somewhere this function :
function Analyse($contents,&$products,&$k) {
// here GLOBALS if need be
$i=0; // n0 de la ligne courante dans $contents[]
$j=count($contents);
$yy=0; // position in line (remember HTML lines may be loooong)
while ($i<$j) { // while not finished unfructuously
while ((strpos($contents[$i],'<tr><td>')===false) and ($i<$j)) $i++;
if ($i<>$j) { // found a data block, else finished
// filling in
$deb='<tr><td>'; // constant introduced for extensiveness
$fin='</td>'; // idem
$ligne=substr($contents[$i],$yy); // rest of line after previous processing
while (($i<=$j) and (($m=strpos($ligne,$deb))===false)) { $i++; $yy=0; $ligne=$contents[$i]; } // skip lines until block found or end of data
if ($i<=$j) { // found a product, and its position is in $m
$k++; // increments #products found
$m=$m+strlen($deb);
$n=$m;
$locRes='';
$l=strlen($fin);
while (($n<strlen($ligne))and((substr($ligne,$n,$l))<>$fin)) $n++; // linear search for end marker
if ($n==strlen($ligne)) { $locRes=substr($ligne,$m); $i++; $yy=0; $n=strpos($contents[$i],$fin); $m=0; $ligne=$contents[$i]; }
if (!($n===false)) {
$locRes.=substr($ligne,$m,$n-$m);
$yy=$yy+$n+1;
} else { $locRes.=''; $yy=0; }
$products[$k]["price"]=substr($locRes,1); // get rid of dollar symbol
$deb='">'; $fin='</A'; // note case
$locRes="";
$ligne=substr($contents[$i],$yy); // rest of line after previous processing
while (($i<=$j) and (($m=strpos($ligne,'ID='))===false)) { $i++; $yy=0; $ligne=$contents[$i]; } // special
if (! ($m===false)) { // found a product, and its position is in $m
$l=strlen($deb);
while (substr($ligne,$m,$l)<>$deb) $m++;
$m=$m+strlen($deb);
$n=$m;
$l=strlen($fin);
$locRes='';
while (($n<strlen($ligne))and((substr($ligne,$n,$l))<>$fin)) $n++; // linear search for end marker
if ($n==strlen($ligne)) { $locRes=substr($ligne,$m); $i++; $yy=0; $n=strpos($contents[$i],$fin); $m=0; $ligne=$contents[$i]; }
if (!($n===false)) {
$locRes.=substr($ligne,$m,$n-$m);
$yy=$yy+$n+1;
} else { $locRes.=''; $yy=0; }
} // if found second tagged data
$products[$k]["name"]=$locRes;
} // if found product
} // if found a new data block (product line) or end-of-data marker
// else finished
} // while not finished entirely
} // Analyse Procedure
// Nota Bene : all the ugly parsing I do is usually handled via a function more "intelligent" called GetChunk ;-)
?>


By: PHP newbee Date: 24/02/2003 23:24:00 English  Type : Comment
Oi.. now there's a chunk of code. Give me a moment to stare blankly at it while trying to force my brain into drive long enough to realize I don't understand it, and then I'll get back to you. :)

It'll take a few to look and see how/what you're doing, so please be patient. I haven't forgotten this Q, even if I am taking forever to respond at times.
By: VGR Date: 25/02/2003 20:30:00 English  Type : Comment
no problem. Of course it has to be adapted.

To be clear :
-either you read the file (page at URL <A HREF="http://">http://</a>...) in a memory array and then parse like immediately above,
-either you can and want to parse line per line while reading the file (inside the loop fopen..fclose of my first answer), eventually stopping reading once you found what you searched for.

It depends on what you do want.

Regards,
By: PHP newbee Date: 13/03/2003 19:16:00 English  Type : Comment
My apologies for the delay in responding. Certain 'things' have come up which have resulted in delays in my free time, and thus my ability to check in on such things as EE. :(

Thanks for the help. I'm confident that I understand enough of that to figure out how to adapt it to my needs.. although my needs are no longer existant in this context. I'm sure it'll help in the future, though. :)

Do register to be able to answer

EContact
browser fav
page generated in 350.007060 milliseconds

Why Google AdSense ads ?

compteur
 Ranking-Hits PageRank for this page