Saturday, January 21, 2017

Post-midnight wget vs xkcd

I re-watched The Social Network yesterday. Dunno why, really.
Anyways, I noticed something about a wget meme at the start of the film, where he downloads some pics to have on his site. I figured I'd give it a go. Turns out it's a pretty good meme.

After reading some docs and fooling around for a while, I was looking for an actual thing to do with wget. And that's when it hit me: what if you could download up to the Nth xkcd comic?
And the game was on.

First, I was gonna do this in Python. It's the only hacky/easy language I'm fluent in and there's no way I would mess around in Java or, God forbid, C. So Python it is then.
First things first, how about just getting the goddamn memes with wget? Not so fast, buddy. Tried that. Got back a robots.txt file. Ughhh.
So for those of you who don't know what that means, basically there's often a system in place to, I guess, "block access" to certain "robots" - bots, programs, that kind of stuff. Apparently, wget falls under that category, and apparently, it also complies with the robots.txt rules, so you can't get past that. But actually, you can. You have the option to ignore them. So I just add "-e robots==off" and "--wait 1" to wget's arguments and we're set!
Ughhh. Now it doesn't download anything. What the hell?
Apparently, xkcd has set some pretty good memes, 'cause you can't just wget the whole thing. For reference, I tried just getting all the images (and overriding robots.txt of course) on another site and I did manage to get all the images (or most of them anyway).
BUT, there's a solution. It's really hacky and ugly, and I'm sure there's a better way, but here's how I did it:
I noticed that, while you didn't have access to the image from the normal xkcd page, the image url is always in the html file.
So what I did is I fetched the index.html, then ran through it to find the image url, then wget'd the image from that url. Pretty simple, right?
So this all ties together like this: you input a certain number of comics, the program runs through pages xkcd.com/1/ through xkcd.com/N/, downloads and parses the .htmls, adds the url to a list of urls, then once it's done it downloads all the images based on the urls and saves them in a folder.
The only problem I've had with the whole procedure is that some .htmls would download in code form (binary? hex? probably hex) and thus I couldn't read the image url off of them. Page 3 was giving me a lot of shit - I actually tried wget -v and it actually worked on 3 but didn't work on 2. By testing, I saw that it sometimes worked and sometimes didn't work.
So, finally, firstly because I'm just bored and secondly because it's almost 3 am and I'm really tired, I just worked around the problem by having the user input a maximum number of tries, so that wget will try to get the right page N times and if it doesn't succeed it'll just move on. I tried having it try forever, but some pages like #16 I think just won't download properly. Anyway, with something like 100 tries I found out you lose like 2-3 comics out of 30, so it's not that bad - AND they are enumerated so in the end you know which ones you lost.
Anyways, I'm off to bed now. But do try this. It's a fun exercise.

P.S.: You can download my script here (and you need to have wget and Python 2 installed, obviously).

Monday, February 24, 2014

Python Programming Tutorial #4

I finally found the time to make another tutorial...yaaaaay!
So, anyway, this time I'm gonna introduce to you another kind of "variable": lists.
Lists are like normal variables, except that they can store more that one value simultaneously. They are close to what is known -in other programming languages, like C- as arrays. Anyway, here's a simple definition: if variables are boxes, then lists are bookcases; they can hold many things.

Here are some simple examples of lists:
mylist = []
anotherlist = [1, 2, 3]
yetanotherlist = [True, "ff", "Hi, 2!", 4378]


Now you can store as many things as you wish in one variable. Cool, huh?
Imagine if you were making a game, then you could easily have the player's inventory in a list, rather than making many different variables (which are also more memory-expensive).

But, before continuing, let's see some other examples of lists:
x = 4
mylist = [x, 5, 15]
fancylist = [2, [3, 4], 5] # this is called a "2-dimensional list"...list-ception!


As you can see, you can put pretty much anything inside a list, including another list.
Hold on, though...this "list" thing is cool and all, but how can I access its elements? How can I add new ones, or delete the ones I don't need?

Well, that's quite simple! Let me show you:
mylist = [1, 2, 3]
print mylist
print mylist[0]
print mylist[2]

And here's my output: (the $ thing is just my command line prompt, pay attention to what comes after it)
(also, from now on I will call the file I'm working on "test.py")
$ python test.py
[1, 2, 3]
1
3


Whoah, just wait a second. What the hell happened at line 3? Why do I need to ask for the 0th element to get the first one?!
Well, as you saw, the syntax of getting a list element is "list[element_number]". The thing about lists, though, is that they are "zero-indexed". That means that we're naming the elements starting from the number 0 instead of 1. So, the 1st element is the 0th, the 2nd is the 1st, the 3rd is the 2nd and so on.
[Tip: To easily remember this, just subtract one from the position of list element you want to access. Example: element 3 - 1 = 2. Thus, the 3rd element is accessed by the command "mylist[2]".]

You can also add or change elements just by assigning values to certain positions in the list:
menu = ["eggs", "tomatoes", "chicken"]
print menu
menu[2] = "steak"
print menu # see the difference?
menu.append("potatoes") # let's add another element to the menu
print menu # woohoo!

Here's what you should get:
$ python test.py
['eggs', 'tomatoes', 'chicken']
['eggs', 'tomatoes', 'steak']
['eggs', 'tomatoes', 'steak', 'potatoes']


So, there's some new stuff: apparently, to change the value of an element you just assign to it the value of something else (as you saw in line 3). Also, to add a new element to the list, you use a "list method". Methods are neat little functions (aka: bunch of code that is repeatedly used and does the same stuff) that can only be used on things like lists, strings etc. You can find all list methods and how to use them in the Python docs.

Anyway, all "append" does is adding another element to the list. What if you want to delete an element, though?
menu = ["eggs", "tomatoes", "chicken"]
print menu
menu.remove("eggs") # goodbye eggies
print menu
special_ingredient = menu.pop()
print "What's left: %r" % menu # %r is for printing in raw format (sometimes it's useful to know what data type is the output)
print "Special ingredient: %s" % special_ingredient

Output:
$ python test.py
['eggs', 'tomatoes', 'chicken']
['tomatoes', 'chicken']
What's left: ['tomatoes']
Special ingredient: chicken


And that's how to delete elements from lists! Note that "pop()" not only removes the (last) element, but also returns it; that means we can store that element in a variable, as I did with "special_ingredient".
So, go ahead and try out some other methods! Get to know lists as much as you can!
Thank you for reading this tutorial! I'll see you next time, goodbye!

Monday, February 17, 2014

Update (sort of)

Hey, how's it going? Yup, I know I post every 6 months or so, but I'm trying to focus on studying, as the next two years will be of most importance to me; these are the years that will define wether or not I will have the opportunity to get in the university of my choice (lol, I just noticed, I make it sound way more dramatic than it really is).

Anyway, despite the fact that there is no time to do much stuff, I'm still uploading my greek Python tutorials and I'm now thinking about continuing the english ones here (yeah, I kinda miss blogging). Other than that, I'm participating in a competition in informatics; I've succeeded in the first phase, submitted my solution for the second one and, if I make it through that as well, I'll be competing in the third phase (duh), which - unlike the previous two phases - will take place in informatics labs around the country. That worries me a little, but I guess I'll get used enough to C (which is the programming language I chose for the competition, with the available languages being C, C++ and Pascal) and I won't have any problems I will need to research on the net. If I can make it through the third phase as well, then there's a serious chance I might compete with people from around the world, which is pretty awesome. It's the best case scenario, but the first 2 problems were quite easy to solve, so I'm feeling a little optimistic. :)

On a completely different note, the weather's been great; it's been quite a soft winter (in the part of the country I live in, at least) and probably we're now experiencing an early spring, which makes me feel happy and productive (except for things like the essay I have to write now, which makes me think of various ways to commit suicide).

I'll post again as soon as I can! Goodbye!

Friday, September 20, 2013

YouTube Update!

My YouTube channel just hit 50k views! Thank you all for your support! Stay tuned, there's a lot to come! :)

Sunday, July 28, 2013

Adventext Game Engine!

Hey guys! I know it's been some time since my last post, but...here's a new one! And, if you know some Python, a pretty nice one too...

So I have made here a text game engine in Python, which is simple and very easy to use. You are very welcome to edit any part of the code and suggest things at jimkokko@gmail.com, jimkokko5 on YouTube, @jimkokko5 on twitter or even here.

I will also make a tutorial (probably in August) on how to use this engine, in case the comments in the code can't give you enough info.

Enjoy!

Tuesday, June 25, 2013

Python Programming Tutorial #3

In the previous tutorial, we talked a little bit about printing stuff. But that's just really basic. Let's get deeper.

We can print multiple stuff together:
x = "James"
print "My name is", x


If you run this, you will get "My name is James". Try it out!
Also, since we talked about running Python scripts...if you follow my instructions, now you will be using IDLE to write and run your scripts. But you know what? Forget about IDLE. It may be convenient, but it will limit you in a lot of ways. So, from now on, we will be using the console. For starters, go ahead and download a text editor. You can download whatever you like, but I recommend Notepad++ for Windows and TextWrangler for Mac. If you are on a Windows machine, use the powershell (you can find it by searching "powershell" in the "Start" menu), and follow these instructions (check the "Windows" part) to make it work with Python. Navigate to the directory of your script using the "cd" command. Do the same on the Mac, using Terminal. If you're on a Mac and your file is on the desktop, you need to type "cd Desktop". It's more or less the same thing for Windows users. If you're having trouble with this, take a look at this link. Once you're in your file's directory, simply type "python yourfilename.py". Replace "yourfilename.py" with your file's name (duh) and include the ".py" at the end. Press enter, and your program is running.
I apologize for the hard transition from the nice and comfy IDLE to the cold and cruel console, but that's just the way it is. You may not realize it yet, but when you're getting more familiar and advanced with programming, you can't depend on IDLE. Instead, the console can give you everything you need. So bare with me on this one.

So now let's just go back to our .py file. Let me show you a couple more ways to do this:
x = "James"
print "My name is " + x # as you can see, the + does not put a
                        # space between the strings like the ,
print "My name is %s" % x


But wait...what happened there in the last line?! I guess it's time to talk about some "special characters", then. These characters are used in most programming languages, so expect to see them everywhere you go. The one you used is an "s" because it stands for "string". What you may have noticed it that it replaces "%s" with the string that comes after it; in our case, x, which is "James", of course. There are other "special characters", though, like "%d", which stands for "decimal integer" (or at least that's what I think, lol). Google them to find out more about them. There are also some things called "escape characters". Here are a couple of them:
# run these and find out what they do
print "This is on one line\nThis is on the next line"
print "This is here\t\t\t\tThis is kinda far from there"

# also, check this out
print 'Did you know that you can have %d "special characters" %s?' % (2, "at the same time")


Allright, you are most likely confused, so let's explain some stuff: Remember the backslash (\) characters you saw back there? Those were escape characters. The computer identifies them thanks to the backslash and treats them differently. "\n" is called the newline character, and it basically does whatever the enter key does on a text editor. The tab character (\t) on the other hand, simply adds a tab and indents your text.
print "You can also escape \"double quotes\""
print 'You can escape \'single quotes\', too'


You've got to be careful with combining (concatenating) strings, though. For example:
print "I have " + 3 + " apples" # this won't work
print "I have", 3, "apples" # but this will
print "I have " + str(3) + " apples" # this will work, too


What you need to understand is that you can't concatenate different data types, for example, strings and integers.