Python from Scratch
Learning to Read
First: A Python program is made up of tokens; you can think of these as "words". Some examples of tokens:
"hello world"
6
(
while
print
Generally there are four types of token, although in practice the lines between them get blurred a little bit.
-
Literals literally represent some value.
"hello world"
and6
and4.2
are examples of such literals; the first represents some text and the others represent numbers. This is literal as opposed to some indirect representation like4 + 2
or"hello" + " " + "world"
. -
Operators include things like math operators
+
,-
,*
, but also things like the function call operator( )
, boolean operatorsand
, and myriad other operators. There's a comprehensive list here but beware - there's a lot and some of them are pretty technical. The main point is that( )
and+
are the same kind of thing as far as the Python interpreter is concerned. -
Keywords are special directives that tell Python how to behave. This includes things like
if
anddef
andwhile
. Technically, operators are also keywords (for exampleand
is a keyword) but that's not super relevant here. -
Names are the last - and most important - kind of token.
print
is a name. Variable names are names. Function names are names. Class names are names. Module names are names. In all cases, a name represents some thing, and Python can fetch that thing if given its name.
So if I give Python this code:
x = "world"
print("hello " + x)
You should first identify the tokens:
- Name
x
- Operator
=
- Literal
"world"
- Name
print
- Operator
( )
- Literal
"hello "
- Operator
+
- Name
x
The first line of code binds "world"
to the name x
.
The expression "hello " + x
looks up the value named by x
and concatenates it with the literal value "hello "
. This produces the string "hello world"
.
The expression print( ... )
looks up the value - the function - named by print
and uses the ( )
operator to call it with the string "hello world"
.
To be crystal clear: x
and print
are the same kind of token, it's just that their named values have different types. One is a string, the other a function. The string can be operated on with the +
operator, and the function can be operated on with the ( )
operator.
It is valid to write print(print)
; here we are looking up the name print
, and passing that value to the function named by print
. This should be no more or less surprising than being able to write x + x
or 5 * 4
.
Namespaces
First-and-a-half: A namespace is a collection of names.
You might also hear this called a "scope". This is the reason I say "maybe three or four, depending how you count"; this is really part of that fundamental idea of a name, but I'll list it separately to be extra clear.
There are some special structures in Python that introduce new namespaces. Each module has a "global" namespace; these are names that can be referenced anywhere in a given file or script. Each function has a "local" namespace; these are names that can only be accessed within the function.
For example:
x = "eggs"
def spam():
y = "ham"
# I can print(x) here.
# But I cannot print(y) here.
Objects also have namespaces. Names on objects are called "attributes", and they may be simple values or functions, just how regular names might be simple values (x
, y
) or functions (print
, spam
). You access attributes with the .
operator.
obj = range(10)
print(obj.stop) # find the value named by `obj`, then find the value named by `stop`. 10.
Finally, there is the built-in namespace. These are names that are accessible always, from anywhere, by default. Names like print
and range
are defined here. Here's a comprehensive list of built-in names.
Strings
Second: you asked about characters and letters, so you may appreciate some background on strings.
A string is a sequence of characters. A character is simply a number to which we, by convention, assign some meaning. For example, by convention, we've all agreed that the number 74
means J
. This convention is called an encoding. The default encoding is called UTF-8 and is specified by a committee called the Unicode Consortium. This encoding includes characters from many current and ancient languages, various symbols and typographical marks, emojis, flags, etc. The important thing to remember is each one of these things, really, is just an integer. And all our devices just agree that when they see a given integer they will look up the appropriate symbol in an appropriate font.
You can switch between the string representation and the numerical representation with the encode
and decode
methods on strings. Really, these are the same, you're just telling Python to tell your console to draw them differently.
>>> list('Fizz'.encode())
[70, 105, 122, 122]
>>> bytes([66, 117, 122, 122]).decode()
'Buzz'
For continuity: list
, encode
, decode
, and bytes
are all names. ( )
, [ ]
, ,
, and .
are all operators. The numbers and 'Fizz'
are literals.
† Technically, [66, 117, 122, 122]
in its entirety is a literal - ,
is a keyword, not an operator - but that's neither here nor there for these purposes.
‡ The symbol †
is number 8224 and the symbol ‡
is number 8225.
Names
Second-and-a-half: names are strings.
Names are just strings, and namespaces are just dict
. You can access them with locals()
and globals()
, although in practice you almost never need to do this directly. It's better to just use the name itself.
import pprint
x = range(10)
function = print
pprint.pprint(globals())
This outputs:
{'__annotations__': {},
'__builtins__': <module 'builtins' (built-in)>,
'__cached__': None,
'__doc__': None,
'__file__': '<stdin>',
'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
'__name__': '__main__',
'__package__': None,
'__spec__': None,
'function': <built-in function print>,
'pprint': <module 'pprint' from 'python3.12/pprint.py'>,
'x': range(0, 10)}
For continuity: import pprint
binds the name pprint
to the module pprint.py
from the standard library. The line pprint.pprint( ... )
fetches the function pprint
from that module, and calls it.