Monday, June 16, 2008

Beginners guide to site ripping (4) - Organizing

ORGANIZE IT. This is an art form. Close to a religion. There are many ways of organizing a site rip into something manageble.

You should organize a siterip because of several reasons. Here is a few that hopefully will make you consider your layout.

  1. Your layout should be easily browseable
  2. Your layout should enable anyone looking at the site to find the exact set fast
  3. Your layout should have an order or sequence, dated or numbered
  4. Your layout should include all possible information about every set
  5. You should never rename files, keep the site names. It makes it easier for others to find specific files
  6. Always go for the higest quality of pictures and the best quality of videos, delete the rest
  7. Avoid organizing by model, organize by chronological representation
  8. Keep as flat a structure as possible. It is easier to browse and is CD/DVD/BD burn friendly
  9. Identify videos clearly by putting them in a specific folder or marking the foldernames clearly. Not everybody likes it all mixed up
  10. Do not keep stuff in your layout which has not been directly downloaded from the site. Do not put home created stuff into your layout
  11. Make sure your rip/layout is complete to the date. Do not leave stuff out of the layout without clearly signaling what has been left out
  12. If your layout is media spefic (CD/DVD/BD) then keep the same layout on all medias.
  13. Your layout should be extensible
  14. If you make csv's after you have organized your siterip, then create a comment on the 1st. line which tells the date of the siterip, and possible notes to what is NOT in the csv. If you favour extensibility then make notes to anyone who may extend your work
  15. If a site is messy then include set numbers/codes in the foldernames so that anyone other than you can find his or her ways around the site and your layout. It should be possible to easily match your layout to the site
  16. Avoid having foldernames longer than 64 characters long due to media burnability
  17. Avoid national characters in foldernames as well as special characters. Replace & with 'and', replace , with - and so forth. Keep it simple

This article is not about using your Offline Browser to filter away all the unwanted stuff, this is something you eventually will learn. And in the case you are not using an Offline Browser for the job then it is probably ReGetm DownThemAll or GetRight which helps you.

Manual
The simplest way of organizing is the manual way. You quietly write the directory names yourself, either by copy/paste from the site to your Windows Explorer, Total Command or DirectoryOpus. And build everything up by hand. Unzipping manually and everything.

If you are maintaining a site on a regular basis then this is the easiest way to go. It is seldom beneficial to go tool crazy when you just have to download and add a set or two. It would recommend that everybody try stuff manually. And a site of up til 100 sets can easily be done manual.



File tool
There seems to be 3 tools of preference. They are: Windows Explorer, Total Commander and DirectoryOpus.

I have personally never been a fan of Total Commander ( http://www.ghisler.com/ ), but the tools is helpful for zip files among others and used by a skilled person it can be a deadly weapon.

DirectoryOpus ( http://www.gpsoft.com.au/ )can do everything for you. You can script things and unzip etc. Highly configureable and nice to work with.

Windows Explorer still a pretty handy tool in the hands of a skilled person. The unzip capabilities are too poor. But it is there and if you master the tool then it can be pretty effective.



The editor
If you do not go 100% manual on stuff then chances are you have made lists in a editor. Notepad is always there but not quite good for the job of making either download or organizer or folder renamer lists. You need something a bit more powerful.

Excel is good for lists, you can easily sort stuff and export it to different formats or just simply copy/paste from it.

UltraEdit ( http://www.ultraedit.com/ )is a allround editor. Pretty good at everything. Column mode supported, which is a big plus. Easy macro recording and playback. My favourite for editing. Column mode is extremely useful for making both download lists and also directory name lists. Writing directly column mode can also be very beneficial when creating dated folder names.

EditPad Pro ( http://www.editpadpro.com/ ) same the same capabilities as UltraEdit, and a very good regular expression capable search/replace tool. It is more a decision about religion in regard to what editor you use. But I'd advice you to go for either UltraEdit or EdiPad Pro, or maybe both if your wallet is thick enough. There are other alternatives to editors and some are also free of charge. You may consider SlickEdit http://www.slickedit.com/, Notepad++ http://notepad-plus.sourceforge.net/uk/site.htm or even EditPlus http://www.editplus.com/.



Helper files (text and BAT)
It can be quite useful to make BAT files for helping you with making directories, unzipping or moving files around. There is not far from creating a text file in your editor with all the foldernames into making a batch file. Take a look at the two pictures.




The text file version (to top most) just gives you the directory names. The batch (BAT) file version put a MKDIR (MaKe DIRectory) command infront of the foldername and if you save the BAT file to the directory and the doubleclick it then it creates all the folders for you, and you have less manual work.

You can make more advanced BAT files, but it will require you to learn the commands which can be used in the BAT file. All command line tools can be used. So stuff like CD (Change Directory), REN (REName), DEL (DELete) and MOVE (MOVE) are available to your disposal. Both WinRAR and PKZip / UNZip also have command line versions of their tools so you can unzip alot of files with one BAT file.

Take a look at the unzip.bat file below which enters every directory and unzip's all the zip files in the directory and once finished it deletes all the zip files in the directory and finally goes back (cd ..) to the original directory.



There are tools for creating batch files (BAT). One of the tools is called DO which is an old tool from 2002. It is commandline based and you can download it here. Be sure to read the do.txt file for help.


Common layouts
The common standard towards layout is a) Browsable, b) Easy to find specific sets and c) Maintainable. If sets are dated, then use dates. Keep the dates in ISO standard which means year-month-day. If dates are not used the use numbers in sequence. Below you have 6 pictures which you can click to see common layouts. Click each picture to enlarge.


Beginners guide to site ripping (3) - Tools

In the article "Beginners guide to site ripping (2) - Check & CSV" we spoke of differnet tools to make a csv.

There are quite a few tools which come into question in regard to csvs. The best are listed below.


The first 4 tools are all collection managers. Which means that they can make a csv and they can sort files according to csvs (ecsv's). They can keep all your csvs in views for later and manage your collections of csv'd material.

The 5th tool, ScanSort, is not a collection manager, it is a tool for sorting contents according to the layout of a ecsv file. ScanSort is a command line tool and thus has no graphical interface.

The 6th tool, fastCSV is a tool for making csvs, it cannot sort files according to an ecsv. This tool is also a commandline tool and has no graphical user interface.

The 7th tool, CSV Workshop, is a tool for viewing csvs and finding problems among ecsvs. This includes internal and external dupes and other nasty things.

All 7 tools are designed for Windows, but there should be a possibility to port PSPV to other platforms as this is written in Perl mainly.

Most of the tools are not maintained any longer and some of them suffer from age, as example CSV Workshop which cannot correctly display the proper internal size of a ecsv file if it holds data above 4Gb.

I digged up some old tutorials for you to read.

Download: Muleskinners guide to Hunter
Download: PigVomit's guide to Hunter
Download: PWA's guide to The!Cheker
Download: PigVomit's guide to The!Checker
Download: ScanSort How To
Download: CSV Workshop screenshots and guide

Note!
PSPV (PhotoServe Professional Verify) is similar to PSCU (PServeCheck Ultimate) and both tools are very new tools and they both gets maintained on a constant basis. Both tools are by far way more advanced (not complex!) and faster than any of the other tools listed.

Both tools are rather intuitive and can peform a bunch of tasks. Some of the features in these tools are as follows.

  • Automatic download of updated csvs from repository sites.
  • Disk space management which automatically moves contents around when needed.
  • Hunting/sorting from raw, zip and rar files.
  • Integrated uploads of updated csvs.
  • Integrated and automatic leech of collections.
  • Mass Miss/Wrong and Needs csv fill possibilities.
  • Automatic mass collection setup.
  • Internal dupe and unwanted file detection.
  • Cache for ultra fast sorting/hunting.
  • Complete filter views for collection groups.
  • View filters for: ongoing, pre-final, final, need to check, ready to burn, burned.
  • Direct integration to Nero and ImgBurn.
  • HTML reports and backup status.
  • Empty directory cleanup.
  • CSV and report cleanup.
  • Leech priority.



World premiere - PSCU
More screenshots here, here and here.

MG is proud to offer you the first public release of PServeCheck Ultimate Edition v4.5.1.45 right here. As with PSPV (the Perl relative to PSCU) this tools is a community tool which has been restricted for a select few for about 6 years. Read the file 'readme.txt' before you run the program and bare in mind that PSCU and PSPV are not meant to be standalone collection managers, as with Hunter or The!Checker. What you get here today is only 1/4th of the full package.

A big salute and thanks to The2nd for quickly assembling a 'standalone' version of PSCU for release here on MG, alongside being the author of PSCU.

Also thanks to the staff at VG for the triggerpack which follows the release. Shouts to Austria for the programming and to Canada for the servers and service at VG.

PSCU quick install:
Just a quick note. There are 2 groups among the trigges which are extendable. The groups are: 'Custom' and 'My Personal Collections' where you can add your own custom stuff. When PSCU refers to the mIRC path then point it the proper directory of the unzip. Edit and check the file remote.ini before you launch the executable the first time.

We will put an article about usage of PSCU.

At the end of this article PSPV is still community tools and thus not open for everybody to use.

Sunday, June 15, 2008

Beginners guide to site ripping (2) - Check & CSV

Let's assume you have managed to download a site, any site for that matter. But at this point you will have ended up with a folder structure of some kind. And you will have some folders with plain .jpg's and maybe even some with .zip files. You also have a folder with some videos, .wmv or .avi probably or maybe some .mov files.

Before you do ANYTHING!, to the files...reorganize, unzip, delete, move, rename.....ANYTHING!. Then this is the expert advice.

Make a csv of the raw rip - ALWAYS!


Why? - you may ask. Well there is 1 simpel answer to that. If you fuck up and accidently delete something or copy/move stuff to the wrong places then the raw site-rip csv will organize the entire thing back to the original places and you can start over. You can even spot/find the file you accidently deleted.

"Make a csv?" WTF is that ? you may ask yourself. A csv is a simpel file which can be used for organizing and checking files. You need tools to create csvs and you also need tools for organizing/sorting stuff according to a csv. We will eventually get to the tools in an article. But to skip the jazz, then go search for tools like PicCheck / The!Checker, Hunter, ScanSort, PSP Verify, PServeCheck, fastCSV etc. The csv files are acutally ecsv (Extended Comma Seperated Values) file and it has a specific layout like this: (excerpt only)

20080527-video-full.wmv,105742805,3925D04D,\20080527 - Vicky - Pantyhose\,
001.jpg,3065686,B75F6804,\20080529 - Natasha - Wall Bars\,
002.jpg,2998642,5B2B26A0,\20080529 - Natasha - Wall Bars\,
...
There are 5 parts of a ecsv file. And the order of the 5 parts are of importance.

  1. Filename
  2. Size counted in bytes
  3. CRC32 value in hexadecimal format
  4. Complete path for the file
  5. Optional comment

The tools listed earlier will help you make such a file. The ecsv files have a naming convention. There are two often used conventions, one is the WSC convention and the other is the ET/VG convention.

WSC looks like this: MVCD2_Mystique_Magazine_Vault_CD2_5744.csv

ET look like this: Watch4Beauty-DVD12(Pre-Final)_1903.csv

WSC put the so called trigger part as the first part of the ecsv filename, in the example it is MVCD2. The next part is the descriptive name (Mystique Magazine Vault). The 3rd part tells you information about media size of the ecsv OR the year of the ecsv. The last part shows how many files the ecsv covers. Finally the extension: csv.

ET uses a different approach, there is no trigger part, instead there is a descriptive name followed by a - and then a media descripter (DVD12). This is always the case and there is a counter together with the media description. Immediately following the media descripter is a ecsv state identifyer. There are 6 different states (normally).

  1. Ongoing, means the whole (...) is omitted. Which means that the csv still changes and probably does not contain sufficient files to be a full CD or DVD.
  2. (Pre-Final). In this case the ecsv has reached the proper number of files so that a full CD or DVD or whatever can be filled if one were to burn the contents the ecsv coversto a media.
  3. (Pre-Final-1). The ecsv is still not 100% closed and the maintainer changed something. Maybe added a missing file, maybe renamed a folder. A change compared to the Pre-Final ecsv.
  4. (Pre-Final-2). The 2nd change from PF (Pre-Final).
  5. (Pre-Final-3). The 3rd change from PF.
  6. (Pre-Final-4). The 4th and hopefully last change from the PF ecsv.
  7. (Final). The ecsv is now closed and no changes will be made to it any further. You could burn the contents the ecsv covers to a media and feel save that it does not change.
  8. (Final-1). The ecsv maker made a minor fuck-up and the (Final) should be discharded.
  9. (Final-2). More errors and the F1 should be discharded the F2 should be used.
  10. (Final-3). More errors and the F2 should be discharded the F3 should be used.
  11. (Final-4). More errors and the F3 should be discharded the F4 should be used.
  12. (Re-Burn). The ecsv maker fucked up good. And there was an error. So stuff was re-organized / cleaned and people should re-burn the media.
  13. (Re-Burn-1) -> (Re-Burn-4). At this point someone will have shot the ecsv maker, for continiously making errors and having to re-do them over and over again.
Most often you will see stuff going from 'ongoing' into (Pre-Final) maybe even (Pre-Final-1) and then to (Final).

I will not speak further about the ecsvs and how to use them. Just make a ecsv of your downloads so that you can use a tool organize the contents back to the original places.

Before you begin to clean up your downloads and put stuff into proper folders then you need to check the contents to see if it actually is any good.

Picture checking
You can use Hunter, jCheck or jpgfix to check the jpeg pictures.

Video checking
As far as I know there does not exist any tools to verify wether a video (avi, wmv or mov) is good. But there are three good indicators. Play it with Fast Forward and watch that it is playable. Secondly, randomly click the seeker bar in your video player to find out wether it is seekable. And lastly, always check the last few seconds to see wether the video ends like it should and not premature.

Hunter needs some understnading and can be quite complex to uses. JpgFix is a commandline tools may be not suit everyone. jCheck is a simple Windows dialog with one purpose. So it should be straight forward using that one.

Sunday, June 8, 2008

Beginners guide to site ripping (1)

When you are faced with the need to download a site from the web then there are several options.

You can always go ahead and spend some days or weeks using FireFox or Internet Explorer and click every picture and video and choose "Save To" or "Save As". Just writing the path's for the jpg's and the videos may take up alot of your time. And chances are you may miss something. So what to do?

Get a download manager!

But which one ? I do not know many download managers, but some of the most known ones out there are:

They can help you when you click around and they will queue up your downloads. And you can stop at one day and continue on another.

You still have to visit every page and right clicking and choose "Download With ...". But you can do it much faster and you do not have to wait for downloads to complete before you begin a new download.

To circumvent having to click every link or at least every page you can get other tools. In general these tools are called Offline Browsers. They can automatically click every link on a website and altomaticially download every file from the website. So basically it is like if you took a copy of the website to your own harddrive. Some of the more common tools in the Offline Browser category are:

Get an Offline Browser!


Personally I started off with the Weazel and had a short fling to the Track before I meet the one and only OE. But that is just personal experience. The Weazel is easy and so is the Widow. The learning curve for the Track is a little higher and OE can be too.

Offline Browsers start by visiting the page you point them to. Then they automatically click every link on that page which leads them to all pages at "level 2". On level 2 they download everything to your harddrive and then they move on to click all links on level 2 so that they reach level 3 and so on.

These tools can work during the night, so you can leave your computer on and let it download a website while you sleep. The tools can also work as if there were 2,3 or 8 or more people downloading the site. So you are not limited to just 1 "virtual" person clicking. If you play around then avoid settings above 8, it is not nice for the website...and if the same person is hammering on a website with 8 differen "people" all wanting to download then chances are the webmaster may get pissed off. Set the limit to 2-3 and maybe even 4.

Downloading the Internet, please wait....

Now, this feature of automatically copying an entire website is simply great. BUT!, what if the website links to Google? Then you begin visiting every link Google has and all the sites it points to ? YEP!, you will begin downloading the entire internet.....And I bet you that you will run dry from memory and harddrive space before the download is finished.

Download only 1 website...


To avoid downloading the internet you have to limit your Offline Browser to only download from the website you want. You can most often do this by only downloading from "the starting server" or only download from "the starting domain".

Perhaps you do not want small thumbnails and other small graphic pieces which are displayed on every page. Maybe you do not even want to save the .htm / .html files. If this is the case then tick off stuff as .gif, .htm etc. This will tell your Offline Browser to NOT to save these file types to your harddrive.

I suggest you play around with an Offline Browser and a good site could be this site. Make a new project and download everything from the site only and avoid downloadin anything which does not belong to this website.

Browse the contents which was downloade and play around with settings which may allow you to both download this website and the graphics too.