Languages :: Java :: how to copy a full java site? |
|||
| By: hamood |
Date: 15/07/2007 03:06:30 |
Points: 20 | Status: Answered Quality : Excellent |
|
I\\\'d like to know how can i copy a full java website including all its pages/images/audio/video. i did try several programs such as Website downloader, surfoffline, grab-a-site 5.0 the only problem with these programs it only downloads the main url\\\'s in the site it cannot access \\\"next page\\\". to be more precise this is an example: the website has a domainname.com/home.asp in the homepage there are 100 pages that you can access by pressing next or by choosing the page number on the bottom of the homepage. but what these programs cant do is they cant really access these 100 pages but only the URL\\\'s. To go to the second page thats the link given javascript:viewPage(2) any help of what program i can use is really more than welcome. thank you and if you have further questions please dont hesitate to ask. thank you |
|||
| By: VGR | Date: 15/07/2007 09:53:21 | Type : Comment |
|
| well, it's probably designed that way specifically to hamper you ;-) people usually don't want their sites to be copied that way. using java (or Flash) is a good start ; using intelligent and obscure javascript on links is an other good step ; could I have the site's URL, to have a look at what could be done ? |
|||
| By: hamood | Date: 16/07/2007 18:57:18 | Type : Comment |
|
| well the site owners gave me permission to copy the products from the website, but they've got thousands of items, and its unrealistic to copy one at a time. if you know of a way or a program that i can use to copy the products section that is realistically reliable ill be very greatful , as for there design i have no interest in copying it. the website is: http://www.oceanwholesale.com/main.asp thank you |
|||
| By: VGR | Date: 17/07/2007 06:49:48 | Type : Comment |
|
| if you've authorization, the simpliest way is you to obtain from them a (emporary?) access via FTP or SFTP, ***or*** them to give you a ZIP/GZ/TAR/whatever Archive containing the files ... also, given it's the products you want, I'm sure thgey can give you the tables from the database along with the images stored for the products. This said, 1) I can't login ;-) 2) it's a flash website, not a java one ;-) tell me if I can help further |
|||
| By: hamood | Date: 18/07/2007 04:25:21 | Type : Comment |
|
| as for them sending me an access to their ftp, they didn't wanna do that its cuz i am only a seller of there items, i dunno y there making it hard, but they said i can take there pics, and list there items, and thats wat i wanna do, to have a catalog and show my customers, but its more than 5000 items, and am sure there;s a way into downloading there items into a catalog style, even if i had to copy it 1 page at a time, atleast its not as bad as 1 item at a time. as for the main page it is flash, but when u log in there is jave in the items, i did a new account for you to checkout the website if you dont mind its: user name: euroexpert pass: asdfasdf whats the easiest way to copying the items? and is there any programs to help. thank you |
|||
| By: VGR | Date: 19/07/2007 07:25:55 | Type : Comment |
|
| i will check later today. | |||
| By: hamood | Date: 23/07/2007 00:16:45 | Type : Comment |
|
| hey there, i was wondering if you had a chance to take a look at the website ? i would realy apreciate if u do. user name : euroexpert password : asdfasdf thank you |
|||
| By: VGR | Date: 23/07/2007 07:28:02 | Type : Comment |
|
| :D I call upon your indulgence : I've three offsprings to care about ;-) No, I didn't take the time to check yet. But will do. Promissed. ASAP. |
|||
| By: VGR | Date: 26/07/2007 14:16:07 | Type : Answer |
|
| ok, your problem has nothing to do with java or flash. It's purely a "site aspirator" problem. You want to siphoon (correct?) the web site. I recommend to use some program to first get the main page's references (left menu, like in Garments : http://www.oceanwholesale.com/sortslist.asp?sortsid=55 Then to repeatedly get those URI in sequence and retrieve the associated images (for instance, the link http://www.oceanwholesale.com/product_detail.asp?Id=24827 is associated with the image http://www.oceanwholesale.com/product_images/uploadpic/200772618144846618.jpg) I say this because I found no logic in the images numbering, or else I would have recommended to directly extract images from the images directory, ie http://www.oceanwholesale.com/product_images/uploadpic/ ; too bad really that the images are not named from the item reference... (H4173 in this case). We would just have had to get all the references (less than one hundred HTTP calls) and then to directly try to extract associated images, stopping when failing to continue the sequence). This would have been faster. I wrote a lot of polling/data extraction robots like this. the problem is that they're very sensitive to changes in the layout of the target site. If you want to copy all the references once, then it's good (trashware IMHO) ; if you want to keep your data updated/synchronized with the target site's, then you'll need to use some hours here and there fixing broken stuff. Not too difficult either. I could even do this for you for a fair compensation (like some items from my wishlist on amazon ;-) |
|||
| By: VGR | Date: 04/10/2007 19:20:40 | Type : Comment |
|
| any news or feedback on this problem ? Should it be closed ? Please do ;-) | |||
| By: OpConsole | Date: 01/11/2007 16:12:13 | Type : Comment |
|
| Dear, If you found some of the above comments to have proved helpful in solving your issue, you shall Accept the Answer or sPlit points between the various useful comments. Each one can receive a quality evaluation from + (somewhat helpful) to +++ (working solution). Given this Question has been Open for quite a while now, please accordingly "accept an Answer" ASAP This Question will be randomely force-closed in one month from now. Thanks and regards. Admin. |
|||
|
Do register to be able to answer |
|||
©2010 These pages are served without commercial sponsorship. (No popup ads, etc...). Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE.
Please DO link to this page!








