Sunday, June 15, 2008

Beginners guide to site ripping (2) - Check & CSV

Let's assume you have managed to download a site, any site for that matter. But at this point you will have ended up with a folder structure of some kind. And you will have some folders with plain .jpg's and maybe even some with .zip files. You also have a folder with some videos, .wmv or .avi probably or maybe some .mov files.

Before you do ANYTHING!, to the files...reorganize, unzip, delete, move, rename.....ANYTHING!. Then this is the expert advice.

Make a csv of the raw rip - ALWAYS!


Why? - you may ask. Well there is 1 simpel answer to that. If you fuck up and accidently delete something or copy/move stuff to the wrong places then the raw site-rip csv will organize the entire thing back to the original places and you can start over. You can even spot/find the file you accidently deleted.

"Make a csv?" WTF is that ? you may ask yourself. A csv is a simpel file which can be used for organizing and checking files. You need tools to create csvs and you also need tools for organizing/sorting stuff according to a csv. We will eventually get to the tools in an article. But to skip the jazz, then go search for tools like PicCheck / The!Checker, Hunter, ScanSort, PSP Verify, PServeCheck, fastCSV etc. The csv files are acutally ecsv (Extended Comma Seperated Values) file and it has a specific layout like this: (excerpt only)

20080527-video-full.wmv,105742805,3925D04D,\20080527 - Vicky - Pantyhose\,
001.jpg,3065686,B75F6804,\20080529 - Natasha - Wall Bars\,
002.jpg,2998642,5B2B26A0,\20080529 - Natasha - Wall Bars\,
...
There are 5 parts of a ecsv file. And the order of the 5 parts are of importance.

  1. Filename
  2. Size counted in bytes
  3. CRC32 value in hexadecimal format
  4. Complete path for the file
  5. Optional comment

The tools listed earlier will help you make such a file. The ecsv files have a naming convention. There are two often used conventions, one is the WSC convention and the other is the ET/VG convention.

WSC looks like this: MVCD2_Mystique_Magazine_Vault_CD2_5744.csv

ET look like this: Watch4Beauty-DVD12(Pre-Final)_1903.csv

WSC put the so called trigger part as the first part of the ecsv filename, in the example it is MVCD2. The next part is the descriptive name (Mystique Magazine Vault). The 3rd part tells you information about media size of the ecsv OR the year of the ecsv. The last part shows how many files the ecsv covers. Finally the extension: csv.

ET uses a different approach, there is no trigger part, instead there is a descriptive name followed by a - and then a media descripter (DVD12). This is always the case and there is a counter together with the media description. Immediately following the media descripter is a ecsv state identifyer. There are 6 different states (normally).

  1. Ongoing, means the whole (...) is omitted. Which means that the csv still changes and probably does not contain sufficient files to be a full CD or DVD.
  2. (Pre-Final). In this case the ecsv has reached the proper number of files so that a full CD or DVD or whatever can be filled if one were to burn the contents the ecsv coversto a media.
  3. (Pre-Final-1). The ecsv is still not 100% closed and the maintainer changed something. Maybe added a missing file, maybe renamed a folder. A change compared to the Pre-Final ecsv.
  4. (Pre-Final-2). The 2nd change from PF (Pre-Final).
  5. (Pre-Final-3). The 3rd change from PF.
  6. (Pre-Final-4). The 4th and hopefully last change from the PF ecsv.
  7. (Final). The ecsv is now closed and no changes will be made to it any further. You could burn the contents the ecsv covers to a media and feel save that it does not change.
  8. (Final-1). The ecsv maker made a minor fuck-up and the (Final) should be discharded.
  9. (Final-2). More errors and the F1 should be discharded the F2 should be used.
  10. (Final-3). More errors and the F2 should be discharded the F3 should be used.
  11. (Final-4). More errors and the F3 should be discharded the F4 should be used.
  12. (Re-Burn). The ecsv maker fucked up good. And there was an error. So stuff was re-organized / cleaned and people should re-burn the media.
  13. (Re-Burn-1) -> (Re-Burn-4). At this point someone will have shot the ecsv maker, for continiously making errors and having to re-do them over and over again.
Most often you will see stuff going from 'ongoing' into (Pre-Final) maybe even (Pre-Final-1) and then to (Final).

I will not speak further about the ecsvs and how to use them. Just make a ecsv of your downloads so that you can use a tool organize the contents back to the original places.

Before you begin to clean up your downloads and put stuff into proper folders then you need to check the contents to see if it actually is any good.

Picture checking
You can use Hunter, jCheck or jpgfix to check the jpeg pictures.

Video checking
As far as I know there does not exist any tools to verify wether a video (avi, wmv or mov) is good. But there are three good indicators. Play it with Fast Forward and watch that it is playable. Secondly, randomly click the seeker bar in your video player to find out wether it is seekable. And lastly, always check the last few seconds to see wether the video ends like it should and not premature.

Hunter needs some understnading and can be quite complex to uses. JpgFix is a commandline tools may be not suit everyone. jCheck is a simple Windows dialog with one purpose. So it should be straight forward using that one.

No comments: