Languages :: PHP :: Grabbing data from an external website and parsing it.. |
|||
| By: PHP newbee |
Date: 07/02/2003 00:00:00 |
Points: 50 | Status: Answered Quality : Excellent |
|
I'm trying to grab an XML from an online game site and parse the data. (I'm doing this more for the experience than any particular desire to duplicate what the game has already duplicated..) Here's what I've got.. $GrabURL = "<A HREF="http://www.anarchy-online.com/character/bio/d/1/name/$name/bio.xml">http://www.anarchy-online.com/character/bio/d/1/name/$name/bio.xml</a>"; $GrabLevelStart = '<level>'; $GrabLevelEnd = '</level>'; $GrabProfessionStart = '<profession>'; $GrabProfessionEnd = '</profession>'; $OpenFile = fopen("$GrabURL", "r"); $RetrieveFile = fread($OpenFile, 200000); $GrabLevelData = eregi("$GrabLevelStart(.*)$GrabLevelEnd", $RetrieveFile, $DataPrint); fclose($OpenFile); $level = $DataPrint[1]; $OpenFile = fopen("$GrabURL", "r"); $RetrieveFile = fread($OpenFile, 200000); $GrabProfessionData = eregi("$GrabProfessionStart(.*)$GrabProfessionEnd", $RetrieveFile, $DataPrint); fclose($OpenFile); $profession = $DataPrint[1]; Note that I'm forced to open and close the file twice.. So, my question to you is, how would I go about not having to open and close it? Is there a quicker, easier way to re-search the file without all the added overhead? The above is just a snippet, in reality I'm doing the open/close bit a good dozen times to parse out all of the desired fields. I'm new to PHP, so I'm open to suggestions on better ways to achieve this goal.. My heart isn't set on the example above, that's just the best I could work up so far. If you need a $name so you can examine the XML from the website, use "burchenal" as the name... it'll give you a valid XML to look at. <A HREF="http://www.anarchy-online.com/character/bio/d/1/name/burchenal/bio.xml">http://www.anarchy-online.com/character/bio/d/1/name/burchenal/bio.xml</a> Thanks. |
|||
| By: VGR | Date: 07/02/2003 08:27:00 | Type : Comment |
|
| if you do $this=array(); $this=readfile($GrabURL); // fopen wrappers being ON you'll end up with $this[], an array in memory that you can parse and analyse the way you want. Exit the problem of the double reading. personally, I prefer doing : fopen(); while not EOF { fread($ligne); $contents[]=$ligne; } fclose() so that I may parse line by line while I memorize the file, but you do like you want 8-) |
|||
| By: TheFalklands | Date: 08/02/2003 04:23:00 | Type : Comment |
|
| PHPnewbee, Try this: <? php $source=""<A HREF="http://www.anarchy-online.com/character/bio/d/1/name/$name/bio.xml">http://www.anarchy-online.com/character/bio/d/1/name/$name/bio.xml</a>"; $Tag="level" // or what ever tag you are looking for $FileContent=file($source); $XMLString=implode('',$FileContent); $Result=strpos($XMLString,"<".$TagName.">"); if($Result!==false) { $TagData=Substr($XMLString,$Result+strlen("<".$TagName.">")); $TagData=Substr($TagData,0,strpos($Data,"<")); } ?> If you plan parsing many fields from this url, you should consider making a function, say 'gettagvalue($TagName)' passing the name of the tag you are looking for and so you can loop it or at least save a bit typing:). Best of luck, TFL |
|||
| By: PHP newbee | Date: 10/02/2003 05:11:00 | Type : Comment |
|
| Just a quick note.. Thanks for the info. I'll look at both suggestions and get back to you. Work's a little busy today, but I'll try to make time. Just wanted to let you know I haven't forgotten about ya'll. :) |
|||
| By: PHP newbee | Date: 19/02/2003 05:22:00 | Type : Comment |
|
| Gentlemen, Thanks for the info. Perhaps my understanding of PHP is not adequate, but neither option seems to do what I seek. VGR: I don't see how that'll allow me to parse something and return the value multiple times.. opening the file for parsing doesn't seem to be the problem. The problem is the parsing itself.. reading the string and returning the value multiple times. As it currently stands, the returned value is dataprint[1] rather than something unique that'll allow me to set it into a string value. TheFalklands: I'm getting a error on line two from your code. At first I figured it might be the double quote marks on the url, but that didn't fix it. Then I figured it may be the lack of semicolon on the level string set, but that also failed to fix it. Still, even assuming I was able to get it working, I'm not fully understanding what your code really does or how I could apply it to my situation. Any other ideas/clarifications? Thank you both for your help. |
|||
| By: VGR | Date: 24/02/2003 22:53:00 | Type : Answer |
|
| that's because you "parse" using eregi() or other RegExp functions... I just did some simple parser for someone else, using only standard string processing functions, and it works... You should change your way of parsing, if parsing doesn't work and you can't know why 8-)) Just for your eyes : <? // initialisations // outer ones $filename='<A HREF="http://www.pricewatch.com/menus/m37.htm">http://www.pricewatch.com/menus/m37.htm</a>'; // URL of the page to get and parse // inner ones $products=array(); // explicit, this is your array to fill $k=0; // number of products found // URI access $fd = @fopen ($filename, "r"); if ($fd) { // si page trouvie while (!feof ($fd)) { $ligne= fgets($fd, 4096); // here you can make some parsing on-the-fly, for example to stop on "RADEON 9700 PRO" $contents []=$ligne; } // while lecture bloquante // $contents = fread ($fd, filesize ($filename)); non blocking : doesn't work properly fclose ($fd); // here call parsing for products. Analyse($contents,$products,$k); } else { // page not found // infos emptied $k=0; // here logging or email alert for invalid page access } // you've your $k products, proceed with displaying or searching for $products[1..$k]["name"]=="RADEON 9700 PRO" // here some DB stuff to store data echo "update begins. "; // CHECK THOSE VALUES !! $dbLogin = 'bshaq34'; $dbPassword = 'xxxxx'; $dbName = 'xxxx'; $dbHost = 'localhost'; //EoCheckToDo <snip> echo "update finished. "; // define somewhere this function : function Analyse($contents,&$products,&$k) { // here GLOBALS if need be $i=0; // n0 de la ligne courante dans $contents[] $j=count($contents); $yy=0; // position in line (remember HTML lines may be loooong) while ($i<$j) { // while not finished unfructuously while ((strpos($contents[$i],'<tr><td>')===false) and ($i<$j)) $i++; if ($i<>$j) { // found a data block, else finished // filling in $deb='<tr><td>'; // constant introduced for extensiveness $fin='</td>'; // idem $ligne=substr($contents[$i],$yy); // rest of line after previous processing while (($i<=$j) and (($m=strpos($ligne,$deb))===false)) { $i++; $yy=0; $ligne=$contents[$i]; } // skip lines until block found or end of data if ($i<=$j) { // found a product, and its position is in $m $k++; // increments #products found $m=$m+strlen($deb); $n=$m; $locRes=''; $l=strlen($fin); while (($n<strlen($ligne))and((substr($ligne,$n,$l))<>$fin)) $n++; // linear search for end marker if ($n==strlen($ligne)) { $locRes=substr($ligne,$m); $i++; $yy=0; $n=strpos($contents[$i],$fin); $m=0; $ligne=$contents[$i]; } if (!($n===false)) { $locRes.=substr($ligne,$m,$n-$m); $yy=$yy+$n+1; } else { $locRes.=''; $yy=0; } $products[$k]["price"]=substr($locRes,1); // get rid of dollar symbol $deb='">'; $fin='</A'; // note case $locRes=""; $ligne=substr($contents[$i],$yy); // rest of line after previous processing while (($i<=$j) and (($m=strpos($ligne,'ID='))===false)) { $i++; $yy=0; $ligne=$contents[$i]; } // special if (! ($m===false)) { // found a product, and its position is in $m $l=strlen($deb); while (substr($ligne,$m,$l)<>$deb) $m++; $m=$m+strlen($deb); $n=$m; $l=strlen($fin); $locRes=''; while (($n<strlen($ligne))and((substr($ligne,$n,$l))<>$fin)) $n++; // linear search for end marker if ($n==strlen($ligne)) { $locRes=substr($ligne,$m); $i++; $yy=0; $n=strpos($contents[$i],$fin); $m=0; $ligne=$contents[$i]; } if (!($n===false)) { $locRes.=substr($ligne,$m,$n-$m); $yy=$yy+$n+1; } else { $locRes.=''; $yy=0; } } // if found second tagged data $products[$k]["name"]=$locRes; } // if found product } // if found a new data block (product line) or end-of-data marker // else finished } // while not finished entirely } // Analyse Procedure // Nota Bene : all the ugly parsing I do is usually handled via a function more "intelligent" called GetChunk ;-) ?> |
|||
| By: PHP newbee | Date: 24/02/2003 23:24:00 | Type : Comment |
|
| Oi.. now there's a chunk of code. Give me a moment to stare blankly at it while trying to force my brain into drive long enough to realize I don't understand it, and then I'll get back to you. :) It'll take a few to look and see how/what you're doing, so please be patient. I haven't forgotten this Q, even if I am taking forever to respond at times. |
|||
| By: VGR | Date: 25/02/2003 20:30:00 | Type : Comment |
|
| no problem. Of course it has to be adapted. To be clear : -either you read the file (page at URL <A HREF="http://">http://</a>...) in a memory array and then parse like immediately above, -either you can and want to parse line per line while reading the file (inside the loop fopen..fclose of my first answer), eventually stopping reading once you found what you searched for. It depends on what you do want. Regards, |
|||
| By: PHP newbee | Date: 13/03/2003 19:16:00 | Type : Comment |
|
| My apologies for the delay in responding. Certain 'things' have come up which have resulted in delays in my free time, and thus my ability to check in on such things as EE. :( Thanks for the help. I'm confident that I understand enough of that to figure out how to adapt it to my needs.. although my needs are no longer existant in this context. I'm sure it'll help in the future, though. :) |
|||
|
Do register to be able to answer |
|||
©2010 These pages are served without commercial sponsorship. (No popup ads, etc...). Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE.
Please DO link to this page!








