Monday, September 1, 2008

Feed the masses csvs

This article is about a feature of VG which enables you to automatically get information about new and updated csvs. You basically do not have to do anything besides having your box on.

The feature is called RSS and it is an abbrevation for Really Simple Syndication. Which basically is a way to gather information from the internet into an easier, more readable and direct way.

RSS is sometimes referred to as Web Feeds, and what basically goes on is that the server(s) you subscribe to with RSS automatically updates a piece of information which your Internet Browser (yes it is RSS enabled) or your RSS program picks up and serves to you when you see fit, and without all the jazz and hazzel of you having to go to the website and click through a few pages to get the information.

RSS is most often free of charge, and in this case it is totally free, nobody will charge you anything.

So, back to business. If you want to subscribe to the RSS feed from VG then you have to have access to the site. And if you do, then you should click the link called RSS link.



The next which happends is that your browser changes to the webpage shown below and there you can see the way IE presents VG's RSS feed. On this page you click Subscribe to this feed. Which makes IE remember that you have a feed you have subscribed too and that IE should check for updated news ever now and then (you can change the schedule later).


You will then be presented with a small dialog box which enables you to put your RSS feed subscribtions into different folders. Default is choosen in the example.


You've now subscribed and the web page changes slightly. There are 3 things of interest besides that you now can see the last 50 updates to csvs on the VG site.

The first thing is the Star in the upper left corner, it is where you can re-find the RSS feeds you've subscribed too - so, it is like your RSS favorites.

The second thing of interest is the green arror pointing to the right, alongside the attachment marker next to the text retreive. Both are clickable links which enables you to save the csv in question to your harddrive.

The last thing of interest is the possibility to change the settings (properties) of the VG feed.



If you click 'View feed properties' then you will be presented with a dialog like shown below. There you can change the frequency of how often IE check wether there has arrived new stuff on VG - default is every 30mins. DO NOT SET IT LOWER, it is simply unnessary.

You can also configure IE to automatically download the attached files, in this case the csv file itself. DO NOT CHOOSE TO DOWNLOAD ALL ATTACHMENTS, IT WILL KILL THE SERVER. And that will force the prople at VG to either remove the RSS feed or something nasty.



That is all, VG information and csvs directly to you, without all the other stuff and need to browse around.

Wednesday, August 27, 2008

Interview with the Hungarian Bear

MG has covered a bunch of different aspect of collecting, maintaining and such. Today we're glad to present you with an interview with SunCo.

SunCo is one of better known maintainers whom have become an often talked about subject, because of his choices and doings, through the last 10 or so years. SunCo is the beginning author for VG collection series like: MET-Art, Hegre, Galitsin, ArnoldSK, Domai, FemJoy, Harris, Slastyonoff, TSM and others.


Today SunCo is in cyber exile for reasons you can read in the interview.

Happy reading.



[MG]
SunCo, you are one of the old timers in this business, how long have you been in collecting?
[SunCo] i've been collecting for 10 years now
[SunCo] i started with modem and downloading sites etc.
[MG] 10 years, that is a long time, have you ever felt like quitting all of it?
[SunCo] never
[SunCo] in the past 2 years i spin on a bit lower level
[SunCo] but still collecting
[SunCo] and i will do
[MG] impressive, pure dedication. You have also maintained alot of csvs and you started some famous collections.
[MG] which collections are you most proud of?
[SunCo] all of them
[SunCo] i'm proud all of them
[SunCo] i cannot decide
[SunCo] especially now, when the good old nice colls look really old and bad compared to the new collys
[MG] You mostly do Fine Art Erotica collections, has it been an easy job?
[SunCo] nope
[SunCo] I've started to make csvs for 2 reasons
[SunCo] 1 nobody had them and i made them
[SunCo] 2 the "official" csv wasn’t good and i made better
[SunCo] .)
[SunCo] for the FAE group
[SunCo] nobody had them before and they were nice and unique sites and in the beginning they were not nude enough for the main channel or they did not like them
[SunCo] so i CSV’d them
[MG] You started MET and HEGRE among others, how do you feel about the increasing Gb size of the contents today?
[SunCo] one of my eyes is laughing and the other is crying
[SunCo] I’m always happy to have more stuff
[SunCo] but in the real life
[SunCo] ppl cannot check, watch and enjoy that much amount of stuff
[SunCo] i remember how happy we were with 100 pieces of 800x600 pics
[SunCo] but now with 2000 pieces 4000x3000...
[SunCo] it starts to become too much a bit
[MG] True. Is it true that you have not watched most of your stuff?
[SunCo] i watch
[SunCo] i always watched at least 1 time everything i get
[SunCo] that was the reason why i stopped some collys
[SunCo] i saw that they are unable to show me anything new
[SunCo] always the same girls in the same positions etc
[SunCo] became boring.
[MG] how much time do you spend on collecting on a daily basis, or shall I ask: how much time does your box spend on collecting every day?
[SunCo] my box is up 24/7
[SunCo] always downloading or uploading something
[SunCo] and as a big download is finished i always sort and watch them
[MG] You decided to go into exile a little away from the main channels, why is that?
[SunCo] ?
[SunCo] why i left the main channel?
[MG] yes
[SunCo] i had many reasons
[SunCo] those colls i collected, i collected from the sites and i made the csv, so they needed me, not needed them
[SunCo] the other reason is
[SunCo] when sometimes i needed something
[SunCo] it was really pain in ass to get it
[SunCo] slow or incompletes
[SunCo] i made my own channel to be together with my FRIENDS
[SunCo] and the big reason is
[SunCo] i did not like the ops behavior
[SunCo] some of them were really a bitch and God maniacs
[SunCo] i earned much in the real life and in the net life and irc
[SunCo] and did not want someone to bitch on me
[MG] Completeness is something you've accomplished a lot of. Many times you have managed to get stuff others could not get, are you simply better?
[SunCo] as i said
[SunCo] i always watched what i got
[SunCo] and then i sometimes saw that the official csv was not good
[SunCo] i really spent my time on it
[SunCo] I’m not better
[SunCo] surely not
[SunCo] just i really spent my time on it
[SunCo] and of course
[SunCo] for me, it was a bit easier to get into those nice sites
[SunCo] but i don't want to advertise that
[SunCo] in this stage i always helped to the others too
[SunCo] with how to make csvs
[SunCo] or pw's etc
[MG] You made a channel, with some friends, as you said. Have you ever experienced that people have left or disappeared or even died?
[SunCo] all of them
[SunCo] i got new friends
[SunCo] many for long time
[SunCo] some for short time
[SunCo] i don’t like those who come when they need something and then leave
[SunCo] we are small channel. Enough to be friendly group
[SunCo] of course i lost members and friends too
[SunCo] some of then just never came back
[SunCo] some of them quitted
[SunCo] and we had some ppls gone with death accidentally
[MG] One of your friends Joker1962 passed away some time ago, how did that make you feel, when loosing a friend like that?
[SunCo] he was my very good friend
[SunCo] his death really shocked me
[SunCo] rip jokey!
[SunCo] our friendship was not w/o clouds
[SunCo] but we were friends
[MG] have you ever thought of going back to the main channels again?
[SunCo] yep
[SunCo] i thought
[SunCo] and in the past
[SunCo] i got back
[SunCo] but years after i just went away from them
[SunCo] i still got bitching from the op's side
[SunCo] but i keep the right
[SunCo] to decide that any time i will go back
[SunCo] every few years i check back
[SunCo] maybe things changed :)
[MG] Some of the op's are also in your channel, how do you feel about that?
[SunCo] we are friends
[SunCo] we have no problems here
[SunCo] we have almost the same rules as like in the main channel
[SunCo] just more friendly
[SunCo] and with their experience they help me
[SunCo] on the channel
[SunCo] and in the collections too
[SunCo] to synch the 2 channels
[MG] it sounds like a really friendly place, what is the main things required in your channel, is that friendship?
[SunCo] hmm
[SunCo] how i tell
[SunCo] the bad order
[SunCo] any can come here
[SunCo] and most of them become a friend of us with time
[SunCo] not the other order
[SunCo] like only our friends can come here
[SunCo] we are too few to behave ugly
[SunCo] btw i never wanted a channel bigger than 30-40 ppls
[MG] There are some special collections which are not in the main channel, and only in your channel, how do you feel when collections migrate over to the main channel?
[SunCo] that is why i stopped many csv
[SunCo] we always had special collections
[SunCo] that was a main reason to have my own channel
[SunCo] to collect on our way
[SunCo] some trigs were banned in tpf
[SunCo] and when they saw they are nice
[SunCo] they got them too
[MG] so the channel is a birthplace for new collections?
[SunCo] yes, new collections
[SunCo] with guys with same taste as mine was
[SunCo] with the migration
[SunCo] i have no real problem
[SunCo] sometimes, i am unable to get the site updates
[SunCo] and i cannot subscribe to all sites i want
[SunCo] sometimes it takes months to update a set
[SunCo] this causes someone to take them over
[SunCo] i have no problem with that
[SunCo] just tell and ask me first
[SunCo] and make as good
[SunCo] or better csv than mine
[MG] You've csv'd for many years, what kind of tools do you use?
[SunCo] the very basic tools only
[SunCo] for a long time, i did not use any spec programs for downloading
[SunCo] many times i downloaded by hand
[SunCo] that gave me many times the extra files i found against the official csv
[MG] Internet Explorer?
[SunCo] yep
[SunCo] explorer
[SunCo] mozilla
[SunCo] and i played with the url-s
[SunCo] of course for the mass dl i used dl managers and site rippers
[SunCo] for csv’ing i used for many, many years The!Checker
[SunCo] now PSC
[MG] What kind of programs?
[SunCo] flashget
[SunCo] offline explorer
[SunCo] total commander
[SunCo] mostly i make a dl list by hand
[SunCo] and excel or batchmaker programs
[SunCo] and then dl it
[SunCo] with flashget or total commander
[SunCo] long time ago i used getright some times
[MG] how about organizing then, any special tools used there?
[SunCo] mainly i organize by hand
[SunCo] in the structure i like
[SunCo] in the most cases i make batch files (.bat files)
[SunCo] they are making the unpack and the dir making and etc
[MG] you introduced the layout of YYYY-MM\DD - title, which is widely used today. Are you proud of that?
[SunCo] hmmm
[SunCo] thanks
[SunCo] i did not know that
[SunCo] no need to be proud
[SunCo] I’ve always used that system
[SunCo] the only way to sort, organize etc a big collection
[SunCo] of course now i know
[SunCo] deep in my heart i'm a proud, a bit
[SunCo] btw i never forced anyone to use my methods
[MG] What kind of advice would you like to pass on to a new guy who want to become a maintainer?
[SunCo] first of all
[SunCo] always watch what you collect
[SunCo] enjoy it
[SunCo] and you can find the bad's, missings and corrupts too
[SunCo] be professional
[SunCo] in the way you collect
[SunCo] nobody likes those who start something
[SunCo] and does it only for few months
[SunCo] and stops
[SunCo] talk with others
[SunCo] to find out about the missing files and problems
[SunCo] nobody can make good csv alone
[SunCo] not even me
[SunCo] and
[SunCo] only do as much
[SunCo] as much you can handle
[SunCo] if you want too much
[SunCo] you will get much less
[SunCo] i hope you understood it
[SunCo] :)
[MG] yes, understood
[MG] anything else you would like to add before we finish?
[SunCo] just thanks to everybody
[SunCo] i wasn't able to do this alone
[SunCo] thanks to those who helped
[MG] anyone in particular you would like to send greetings to?
[SunCo] i don’t want to pick anybody
[SunCo] i like them all
[SunCo] but for those who had to gone
[SunCo] like joki
[MG] well, thanks for taking some time for the interview, have a nice day SunCo
[SunCo] I thank

Sunday, August 24, 2008

Hard Labour of Maintaining






Well, maintaining can be a bitch sometimes, and most often demand quite some of your time. And you have to look out for pitfalls and every other nasty thing a webmaster can pull on you.

But sometimes it can also be easy. Take a look at the screenshot.


What you see is 2 scripted update maintainig scripts, which are hooked into Windows' Task Schedule program (Start -> Accessories -> System Tools). One of the scripts even unzip's the contents after download (if you look closely).

This is just for inspiration and not a production run.

Monday, August 18, 2008

Shopping for CSV's ?






Well, we do not actually go to the local 7/11 for getting a new fresh csv when the old one is worn out and used to the fullest. Instead we can visit places like VG.

So in an ideal world we could just blindly grab any csv and trust that things were in order. But that far from reality.

CSV's have errors, and missing stuff. Comparing csvs posted on usenet with csvs from VG or some other place may be nessary to figure out which is the better.

But in the end, it is most often the case that the csv set which is maintained the longest tend to be the better choice.

So, what is this article all about ?

Answer: "Finding the better csv and csv errors"

To do that you can use a tool like CSV Workshop. And furthermore you can watch a video of how to do it.


Download video: MG.CSVWorkshop.XViD.zip

CSV Workshop has alot of other uses but you should explore them yourself. And bare in mind that the tool is 5 years old and at that point it couldn't handle DVD sized csvs, and still cannot. The size is reported wrong, but the dupechecking still works like a charm.

Friday, August 15, 2008

Making a directory list

Here's a small video example of how anyone can create a list of directories. The example uses some lines from some html pages. It is raw html, which we turn into a nice list.




Download video: UltraEdit.DirectoryLists.XViD.zip

Tuesday, July 22, 2008

One pickup line for Bang Bros

This summary is not available. Please click here to view the post.

Friday, July 18, 2008

Alternative - DOS Scripting

This summary is not available. Please click here to view the post.

Thursday, July 17, 2008

The Alternative - Scripting the Bad Jojo

There is various ways to download.

A) You can download by using your browser and choose Save.

B) You can also use download managers like Down Them All, ReGet or GetRight and even extend it with download lists.

C) You can also use an advanced Offline Browser like Offline Explorer EE, BlackWidow, HTTrack, HTTP Weazel or Teleport Pro. And you can even spice it up with filters, depth, HAM and macros while the program crawls like a spider over the website of choice.

State of the art tools all of them, doing what they do best. But they all have overhead. Either you have to click too many pages, or you will have to hand pick every download with the download managers or your "spider" traverses too many pages to download only what you want.

What you need is an alternative

The alternative to choose is:


Scripting is only something for you if you have skills. You need skills in programming, you need to have knowledge about http/url's and html. Knowledge about JavaScript and Ajax is an advantage. And especially knowledge about regular expressions is very beneficial to have.


These skills are all learnable. Here on MG you can read about regular expressions and learn them here in article 1, 2, 3 and 4 and you can also gain knowledge about URL's.

But before you go all cold and say that this is way above your head or maybe even too tedious, then you should take some time to watch a small 3:39min video of a tiny script (62 lines) which makes some things easier. The script is not flashy or anything, but it may inspire you.


Download Scripting.badjojo.XViD.zip (6.26Mb)


Thanks to _Store_ for keeping Perl in the focus area and thanks to lobos for script inspiration.

Saturday, July 5, 2008

CSV laws ?

Rumors informs that there is a new revised Standard Act on the way.

The original standard act "El-Toro Standard Act for CSV Makers v1.01" was much about protecting the csv maker and setting up loose guidelines of how to organize csvs if there were to be accepted as official ET material.

The upcoming revised act "VG Standard Act for CSV Makers v1.?" seems to have a somewhat stricter approach towards the csv maintainer, including the ability to override decisions made by the maker. Is this a loaded gun in your back head ?, could be.

The pre-copy of the act includes new guidelines for new media and the conversion factors between medias, CD -> DVD -> DVD9 -> HDDVD -> BD etc, which is nice. It seems that things have loosened in this area.

There are minor additions about chronology and ordering including character use. All in the good interest of making good csvs which can be used anywhere, it seems.

A very good new thing is the "War Against Pre-Final", the act draft has suggestions about the maxinum timespan for a Pre-Final, which can turn up to be one of the better things the new act introduces.

Let's all hope for the best and get the most out of it.

Happy summer holiday to you from MG.

Monday, June 16, 2008

Beginners guide to site ripping (4) - Organizing

ORGANIZE IT. This is an art form. Close to a religion. There are many ways of organizing a site rip into something manageble.

You should organize a siterip because of several reasons. Here is a few that hopefully will make you consider your layout.

  1. Your layout should be easily browseable
  2. Your layout should enable anyone looking at the site to find the exact set fast
  3. Your layout should have an order or sequence, dated or numbered
  4. Your layout should include all possible information about every set
  5. You should never rename files, keep the site names. It makes it easier for others to find specific files
  6. Always go for the higest quality of pictures and the best quality of videos, delete the rest
  7. Avoid organizing by model, organize by chronological representation
  8. Keep as flat a structure as possible. It is easier to browse and is CD/DVD/BD burn friendly
  9. Identify videos clearly by putting them in a specific folder or marking the foldernames clearly. Not everybody likes it all mixed up
  10. Do not keep stuff in your layout which has not been directly downloaded from the site. Do not put home created stuff into your layout
  11. Make sure your rip/layout is complete to the date. Do not leave stuff out of the layout without clearly signaling what has been left out
  12. If your layout is media spefic (CD/DVD/BD) then keep the same layout on all medias.
  13. Your layout should be extensible
  14. If you make csv's after you have organized your siterip, then create a comment on the 1st. line which tells the date of the siterip, and possible notes to what is NOT in the csv. If you favour extensibility then make notes to anyone who may extend your work
  15. If a site is messy then include set numbers/codes in the foldernames so that anyone other than you can find his or her ways around the site and your layout. It should be possible to easily match your layout to the site
  16. Avoid having foldernames longer than 64 characters long due to media burnability
  17. Avoid national characters in foldernames as well as special characters. Replace & with 'and', replace , with - and so forth. Keep it simple

This article is not about using your Offline Browser to filter away all the unwanted stuff, this is something you eventually will learn. And in the case you are not using an Offline Browser for the job then it is probably ReGetm DownThemAll or GetRight which helps you.

Manual
The simplest way of organizing is the manual way. You quietly write the directory names yourself, either by copy/paste from the site to your Windows Explorer, Total Command or DirectoryOpus. And build everything up by hand. Unzipping manually and everything.

If you are maintaining a site on a regular basis then this is the easiest way to go. It is seldom beneficial to go tool crazy when you just have to download and add a set or two. It would recommend that everybody try stuff manually. And a site of up til 100 sets can easily be done manual.



File tool
There seems to be 3 tools of preference. They are: Windows Explorer, Total Commander and DirectoryOpus.

I have personally never been a fan of Total Commander ( http://www.ghisler.com/ ), but the tools is helpful for zip files among others and used by a skilled person it can be a deadly weapon.

DirectoryOpus ( http://www.gpsoft.com.au/ )can do everything for you. You can script things and unzip etc. Highly configureable and nice to work with.

Windows Explorer still a pretty handy tool in the hands of a skilled person. The unzip capabilities are too poor. But it is there and if you master the tool then it can be pretty effective.



The editor
If you do not go 100% manual on stuff then chances are you have made lists in a editor. Notepad is always there but not quite good for the job of making either download or organizer or folder renamer lists. You need something a bit more powerful.

Excel is good for lists, you can easily sort stuff and export it to different formats or just simply copy/paste from it.

UltraEdit ( http://www.ultraedit.com/ )is a allround editor. Pretty good at everything. Column mode supported, which is a big plus. Easy macro recording and playback. My favourite for editing. Column mode is extremely useful for making both download lists and also directory name lists. Writing directly column mode can also be very beneficial when creating dated folder names.

EditPad Pro ( http://www.editpadpro.com/ ) same the same capabilities as UltraEdit, and a very good regular expression capable search/replace tool. It is more a decision about religion in regard to what editor you use. But I'd advice you to go for either UltraEdit or EdiPad Pro, or maybe both if your wallet is thick enough. There are other alternatives to editors and some are also free of charge. You may consider SlickEdit http://www.slickedit.com/, Notepad++ http://notepad-plus.sourceforge.net/uk/site.htm or even EditPlus http://www.editplus.com/.



Helper files (text and BAT)
It can be quite useful to make BAT files for helping you with making directories, unzipping or moving files around. There is not far from creating a text file in your editor with all the foldernames into making a batch file. Take a look at the two pictures.




The text file version (to top most) just gives you the directory names. The batch (BAT) file version put a MKDIR (MaKe DIRectory) command infront of the foldername and if you save the BAT file to the directory and the doubleclick it then it creates all the folders for you, and you have less manual work.

You can make more advanced BAT files, but it will require you to learn the commands which can be used in the BAT file. All command line tools can be used. So stuff like CD (Change Directory), REN (REName), DEL (DELete) and MOVE (MOVE) are available to your disposal. Both WinRAR and PKZip / UNZip also have command line versions of their tools so you can unzip alot of files with one BAT file.

Take a look at the unzip.bat file below which enters every directory and unzip's all the zip files in the directory and once finished it deletes all the zip files in the directory and finally goes back (cd ..) to the original directory.



There are tools for creating batch files (BAT). One of the tools is called DO which is an old tool from 2002. It is commandline based and you can download it here. Be sure to read the do.txt file for help.


Common layouts
The common standard towards layout is a) Browsable, b) Easy to find specific sets and c) Maintainable. If sets are dated, then use dates. Keep the dates in ISO standard which means year-month-day. If dates are not used the use numbers in sequence. Below you have 6 pictures which you can click to see common layouts. Click each picture to enlarge.


Beginners guide to site ripping (3) - Tools

In the article "Beginners guide to site ripping (2) - Check & CSV" we spoke of differnet tools to make a csv.

There are quite a few tools which come into question in regard to csvs. The best are listed below.


The first 4 tools are all collection managers. Which means that they can make a csv and they can sort files according to csvs (ecsv's). They can keep all your csvs in views for later and manage your collections of csv'd material.

The 5th tool, ScanSort, is not a collection manager, it is a tool for sorting contents according to the layout of a ecsv file. ScanSort is a command line tool and thus has no graphical interface.

The 6th tool, fastCSV is a tool for making csvs, it cannot sort files according to an ecsv. This tool is also a commandline tool and has no graphical user interface.

The 7th tool, CSV Workshop, is a tool for viewing csvs and finding problems among ecsvs. This includes internal and external dupes and other nasty things.

All 7 tools are designed for Windows, but there should be a possibility to port PSPV to other platforms as this is written in Perl mainly.

Most of the tools are not maintained any longer and some of them suffer from age, as example CSV Workshop which cannot correctly display the proper internal size of a ecsv file if it holds data above 4Gb.

I digged up some old tutorials for you to read.

Download: Muleskinners guide to Hunter
Download: PigVomit's guide to Hunter
Download: PWA's guide to The!Cheker
Download: PigVomit's guide to The!Checker
Download: ScanSort How To
Download: CSV Workshop screenshots and guide

Note!
PSPV (PhotoServe Professional Verify) is similar to PSCU (PServeCheck Ultimate) and both tools are very new tools and they both gets maintained on a constant basis. Both tools are by far way more advanced (not complex!) and faster than any of the other tools listed.

Both tools are rather intuitive and can peform a bunch of tasks. Some of the features in these tools are as follows.

  • Automatic download of updated csvs from repository sites.
  • Disk space management which automatically moves contents around when needed.
  • Hunting/sorting from raw, zip and rar files.
  • Integrated uploads of updated csvs.
  • Integrated and automatic leech of collections.
  • Mass Miss/Wrong and Needs csv fill possibilities.
  • Automatic mass collection setup.
  • Internal dupe and unwanted file detection.
  • Cache for ultra fast sorting/hunting.
  • Complete filter views for collection groups.
  • View filters for: ongoing, pre-final, final, need to check, ready to burn, burned.
  • Direct integration to Nero and ImgBurn.
  • HTML reports and backup status.
  • Empty directory cleanup.
  • CSV and report cleanup.
  • Leech priority.



World premiere - PSCU
More screenshots here, here and here.

MG is proud to offer you the first public release of PServeCheck Ultimate Edition v4.5.1.45 right here. As with PSPV (the Perl relative to PSCU) this tools is a community tool which has been restricted for a select few for about 6 years. Read the file 'readme.txt' before you run the program and bare in mind that PSCU and PSPV are not meant to be standalone collection managers, as with Hunter or The!Checker. What you get here today is only 1/4th of the full package.

A big salute and thanks to The2nd for quickly assembling a 'standalone' version of PSCU for release here on MG, alongside being the author of PSCU.

Also thanks to the staff at VG for the triggerpack which follows the release. Shouts to Austria for the programming and to Canada for the servers and service at VG.

PSCU quick install:
Just a quick note. There are 2 groups among the trigges which are extendable. The groups are: 'Custom' and 'My Personal Collections' where you can add your own custom stuff. When PSCU refers to the mIRC path then point it the proper directory of the unzip. Edit and check the file remote.ini before you launch the executable the first time.

We will put an article about usage of PSCU.

At the end of this article PSPV is still community tools and thus not open for everybody to use.

Sunday, June 15, 2008

Beginners guide to site ripping (2) - Check & CSV

Let's assume you have managed to download a site, any site for that matter. But at this point you will have ended up with a folder structure of some kind. And you will have some folders with plain .jpg's and maybe even some with .zip files. You also have a folder with some videos, .wmv or .avi probably or maybe some .mov files.

Before you do ANYTHING!, to the files...reorganize, unzip, delete, move, rename.....ANYTHING!. Then this is the expert advice.

Make a csv of the raw rip - ALWAYS!


Why? - you may ask. Well there is 1 simpel answer to that. If you fuck up and accidently delete something or copy/move stuff to the wrong places then the raw site-rip csv will organize the entire thing back to the original places and you can start over. You can even spot/find the file you accidently deleted.

"Make a csv?" WTF is that ? you may ask yourself. A csv is a simpel file which can be used for organizing and checking files. You need tools to create csvs and you also need tools for organizing/sorting stuff according to a csv. We will eventually get to the tools in an article. But to skip the jazz, then go search for tools like PicCheck / The!Checker, Hunter, ScanSort, PSP Verify, PServeCheck, fastCSV etc. The csv files are acutally ecsv (Extended Comma Seperated Values) file and it has a specific layout like this: (excerpt only)

20080527-video-full.wmv,105742805,3925D04D,\20080527 - Vicky - Pantyhose\,
001.jpg,3065686,B75F6804,\20080529 - Natasha - Wall Bars\,
002.jpg,2998642,5B2B26A0,\20080529 - Natasha - Wall Bars\,
...
There are 5 parts of a ecsv file. And the order of the 5 parts are of importance.

  1. Filename
  2. Size counted in bytes
  3. CRC32 value in hexadecimal format
  4. Complete path for the file
  5. Optional comment

The tools listed earlier will help you make such a file. The ecsv files have a naming convention. There are two often used conventions, one is the WSC convention and the other is the ET/VG convention.

WSC looks like this: MVCD2_Mystique_Magazine_Vault_CD2_5744.csv

ET look like this: Watch4Beauty-DVD12(Pre-Final)_1903.csv

WSC put the so called trigger part as the first part of the ecsv filename, in the example it is MVCD2. The next part is the descriptive name (Mystique Magazine Vault). The 3rd part tells you information about media size of the ecsv OR the year of the ecsv. The last part shows how many files the ecsv covers. Finally the extension: csv.

ET uses a different approach, there is no trigger part, instead there is a descriptive name followed by a - and then a media descripter (DVD12). This is always the case and there is a counter together with the media description. Immediately following the media descripter is a ecsv state identifyer. There are 6 different states (normally).

  1. Ongoing, means the whole (...) is omitted. Which means that the csv still changes and probably does not contain sufficient files to be a full CD or DVD.
  2. (Pre-Final). In this case the ecsv has reached the proper number of files so that a full CD or DVD or whatever can be filled if one were to burn the contents the ecsv coversto a media.
  3. (Pre-Final-1). The ecsv is still not 100% closed and the maintainer changed something. Maybe added a missing file, maybe renamed a folder. A change compared to the Pre-Final ecsv.
  4. (Pre-Final-2). The 2nd change from PF (Pre-Final).
  5. (Pre-Final-3). The 3rd change from PF.
  6. (Pre-Final-4). The 4th and hopefully last change from the PF ecsv.
  7. (Final). The ecsv is now closed and no changes will be made to it any further. You could burn the contents the ecsv covers to a media and feel save that it does not change.
  8. (Final-1). The ecsv maker made a minor fuck-up and the (Final) should be discharded.
  9. (Final-2). More errors and the F1 should be discharded the F2 should be used.
  10. (Final-3). More errors and the F2 should be discharded the F3 should be used.
  11. (Final-4). More errors and the F3 should be discharded the F4 should be used.
  12. (Re-Burn). The ecsv maker fucked up good. And there was an error. So stuff was re-organized / cleaned and people should re-burn the media.
  13. (Re-Burn-1) -> (Re-Burn-4). At this point someone will have shot the ecsv maker, for continiously making errors and having to re-do them over and over again.
Most often you will see stuff going from 'ongoing' into (Pre-Final) maybe even (Pre-Final-1) and then to (Final).

I will not speak further about the ecsvs and how to use them. Just make a ecsv of your downloads so that you can use a tool organize the contents back to the original places.

Before you begin to clean up your downloads and put stuff into proper folders then you need to check the contents to see if it actually is any good.

Picture checking
You can use Hunter, jCheck or jpgfix to check the jpeg pictures.

Video checking
As far as I know there does not exist any tools to verify wether a video (avi, wmv or mov) is good. But there are three good indicators. Play it with Fast Forward and watch that it is playable. Secondly, randomly click the seeker bar in your video player to find out wether it is seekable. And lastly, always check the last few seconds to see wether the video ends like it should and not premature.

Hunter needs some understnading and can be quite complex to uses. JpgFix is a commandline tools may be not suit everyone. jCheck is a simple Windows dialog with one purpose. So it should be straight forward using that one.

Sunday, June 8, 2008

Beginners guide to site ripping (1)

When you are faced with the need to download a site from the web then there are several options.

You can always go ahead and spend some days or weeks using FireFox or Internet Explorer and click every picture and video and choose "Save To" or "Save As". Just writing the path's for the jpg's and the videos may take up alot of your time. And chances are you may miss something. So what to do?

Get a download manager!

But which one ? I do not know many download managers, but some of the most known ones out there are:

They can help you when you click around and they will queue up your downloads. And you can stop at one day and continue on another.

You still have to visit every page and right clicking and choose "Download With ...". But you can do it much faster and you do not have to wait for downloads to complete before you begin a new download.

To circumvent having to click every link or at least every page you can get other tools. In general these tools are called Offline Browsers. They can automatically click every link on a website and altomaticially download every file from the website. So basically it is like if you took a copy of the website to your own harddrive. Some of the more common tools in the Offline Browser category are:

Get an Offline Browser!


Personally I started off with the Weazel and had a short fling to the Track before I meet the one and only OE. But that is just personal experience. The Weazel is easy and so is the Widow. The learning curve for the Track is a little higher and OE can be too.

Offline Browsers start by visiting the page you point them to. Then they automatically click every link on that page which leads them to all pages at "level 2". On level 2 they download everything to your harddrive and then they move on to click all links on level 2 so that they reach level 3 and so on.

These tools can work during the night, so you can leave your computer on and let it download a website while you sleep. The tools can also work as if there were 2,3 or 8 or more people downloading the site. So you are not limited to just 1 "virtual" person clicking. If you play around then avoid settings above 8, it is not nice for the website...and if the same person is hammering on a website with 8 differen "people" all wanting to download then chances are the webmaster may get pissed off. Set the limit to 2-3 and maybe even 4.

Downloading the Internet, please wait....

Now, this feature of automatically copying an entire website is simply great. BUT!, what if the website links to Google? Then you begin visiting every link Google has and all the sites it points to ? YEP!, you will begin downloading the entire internet.....And I bet you that you will run dry from memory and harddrive space before the download is finished.

Download only 1 website...


To avoid downloading the internet you have to limit your Offline Browser to only download from the website you want. You can most often do this by only downloading from "the starting server" or only download from "the starting domain".

Perhaps you do not want small thumbnails and other small graphic pieces which are displayed on every page. Maybe you do not even want to save the .htm / .html files. If this is the case then tick off stuff as .gif, .htm etc. This will tell your Offline Browser to NOT to save these file types to your harddrive.

I suggest you play around with an Offline Browser and a good site could be this site. Make a new project and download everything from the site only and avoid downloadin anything which does not belong to this website.

Browse the contents which was downloade and play around with settings which may allow you to both download this website and the graphics too.

Monday, May 26, 2008

Getting the proper tease

One of the issues with the ONTE site is the hassel with downloading the proper higest quality version of the picture sets. Some sets come in plain size, others in both plain and large sizes and some, luckily most these days, comes in ultra high quality too.

This article will give you one way to filter out all the unwanted versions of the zipped picture sets.

In order for this to work you have to go to every month you wish to download.

Then you should save every month on the site to a htm file, no need to save the graphics along, just keep it to html only. Secondly you have to remove all newlines, carrige returns and tabs from the htm files..

You can accomplish that by running a TR command on the commandline in the OS of your choice.

The TR should look like the one below.

tr -d "\n\r\t" < onteXYZ.htm > onteXYZ.html
Do that on every page you saved. Once they are all saved to their .html counterpart, then run the command below.
grep -Po "http://[\w.]+/members/(.....)?zips/[-\w]+\.zip(?=..[\w ]+.?\([\d]+.[\d]+ MB\)./div../div.)" onte*.html
That will extract all the picture zip files in the higest quality available. No need to filter stuff manually. And no need to clean up after the download.

What the grep actually does is to look for any http:// link which links to a zip file and have some trailings which consists of two ending div's.

Have a look below to better understand the result.

Friday, May 23, 2008

Tease and please for show

The csv for the Only Tease site has had a rest of updates since January 2008. The previous maintainer is MIA and that put a stop to a very good csv set.

BUT!, times change and I can happily annouce that new updates are on the way. A source has sprung and this means that new csvs for the ONTE site is arriving at VG and mirrors.

So that you can all enjoy some nice sorting of your downloads.

Here's some cover teasers.










Monday, May 5, 2008

TXT2WJR gets graphical






New version, now with GUI, bugfixes and new feature.

Program purpose:
Convert a text file with url's and an optional text file with directory names into a proper ReGet Deluxe download queue. Ready for downloading.
New features:
  • Appends to an existing ReGet queue if one is choosen
  • Set cookie (optional)
  • Set referer (optional)
  • Set username (optional)
  • Set password (optional)
  • Handles target directories as both absolut and relative or plain.
You can have a quick look at the GUI for TXT2WJR v0.04. This tiny GUI is made with Microsoft C# language and some stuff called WPF which is hidden inside .NET framework v3.5. It's all some new Microsoft Windows Siesta (sorry Vista) jazz. Anyway, it is my first attempt ever on writing a C# program, at the same time it is my first time ever with .NET and even my first time with WPF (Windows Presentation Foundation). So bare with me if it looks and acks like something the cat dragged in.



This package will be available as a ZIP download hosted on MediaFire sometime pretty darn soon. (When I bother to package it, hehe).