In my Python workshops and online courses I see that one of the trickiest things for newcomers is the syntax itself. It’s very strict and many things might seem inconsistent at first. In this article I’ve collected the Python syntax essentials you should keep in mind as a data professional — and I added some formatting best practices as well, to help you keep your code nice and clean.
These are the basics. If you want to go deep down the rabbit hole, I’ll link to some advanced Python syntax and formatting tutorials at the end of this article!
This article is the part of my Python for Data Science article series. If you haven’t done so yet, please start with these articles first:
The 3 major things to keep in mind about Python syntax
The naming conventions of Python's library are a bit of a mess, so we'll never get this completely consistent - nevertheless, here are the currently recommended naming standards. New modules and packages (including third party frameworks) should be written to these standards, but where an existing library has a different style, internal. Class Method Naming Convention: cls vs self. In the previous example, you’ve seen the naming convention of the first argument of a class method: it’s the three characters cls—short for “class”, but we cannot take this because class is a reserved keyword. Single Underscore: variable. Single underscore serves multiple purposes – Use in Interpreter –.
#1 Line Breaks Matter
Unlike in SQL, in Python, line breaks matter. Which means that in 99% of cases, if you put a line break where you shouldn’t put one, you will get an error message. Is it weird? Hey, at least you don’t have to add semicolons at the end of every line.
So here’s Python syntax rule #1: one statement per line.
There are some exceptions, though. Expressions
- in parentheses (eg. functions and methods),
- in bracket frames (eg. lists),
- and in curly braces (eg. directories)
can actually be split into more lines. This is called implicit line joining and it is a great help when working with bigger data structures.
Additionally, you can also break any expression into more than one line if you use a backslash () at the end of the line. And you can do the opposite, too: inserting more than one statement into one line using semicolons (
;
) between the statements. However, these two methods are not too common, and I’d recommend using them only when necessary. (E.g. with really long, 80+ character long statements.)
one line – more statements
a dummy example: one statement in more lines — print(‘Hello’)
#2 Indentations Matter
Do you hate indentations? Well, you are not alone. Many people who are just starting off with Python dislike the concept. For non-programmers it is unusual and on the top of that it causes the most errors in their scripts at the beginning. As for me, I love indentations and I promise that you will get used to them, too. Well, if you’ve worked your way through my Python articles so far, I’m pretty sure that you already have.
Why do we need indentations? Easy: somehow you have to indicate which code blocks belong together — e.g. what is the beginning and the end of an if statement or a for loop. In other languages, where you don’t have to use indentations, you have to use something else for that: e.g. in JavaScript you have to use extra brackets to frame your code blocks; in bash you have to use extra keywords. In Python, you have to use indentations – which in my opinion is the most elegant way to solve this issue.
So we have Python syntax rule #2: make sure that you use indentations correctly and consistently.
Note: I talked about the exact syntax rules governing for loops and if statements in the relevant articles.
One more thing: if you watch the Silicon Valley TV show, you might have heard about the debate of “tabs vs spaces.” Here’s the hilarious scene:
So tabs or spaces? Here’s what the original Style Guide for Python Code says:
Pretty straight forward!
ps. To be honest, in Jupyter Notebook, I use tabs.
#3 Case Sensitivity
Python is case sensitive. It makes a difference whether you type and
(correct) or AND
(won’t work). As a rule of thumb, learn that most of the Python keywords have to be written with lowercase letters. The most commonly used exceptions I have to mention here (because I see many beginners have trouble with it) are the Boolean values. These are correctly spelled as: True
and False
. (Not TRUE
, nor true
.)
There’s Python syntax rule #3: Python is case sensitive.
Other Python Best Practices for Nicer Formatting
Let me just list a few (non-mandatory but highly recommended) Python best practices that will make your code much nicer, more readable and more reusable.
Python Best Practice #1: Use Comments
You can add comments to your Python code. Simply use the #
character. Everything that comes after the #
won’t be executed.
Python Naming Conventions Cheatsheet
Python Best Practice #2: Variable Names
Conventionally, variable names should be written with lower letters, and the words in them separated by _
characters. Also, generally I do not recommend using one letter variable names in your code. Using meaningful and easy-to-distinguish variable names helps other programmers a lot when they want to understand your code.
my_meaningful_variable = 100
Python Best Practice #3: Use blank lines
If you want to separate code blocks visually (e.g. when you have a 100 line Python script in which you have 10-12 blocks that belong together) you can use blank lines. Even multiple blank lines. It won’t affect the result of your script.
same script – with blank lines
Python Best Practice #4: Use white spaces around operators and assignments
For cleaner code it’s worth using spaces around your =
signs and your mathematical and comparison operators (>
, <
, +
, -
, etc.). If you don’t use white spaces, your code will run anyway, but again: the cleaner the code, the easier to read it, the easier to reuse it.
Python Best Practice #5: Max line length should be 79 characters
If you reach 79 characters in a line, it’s recommended to break your code into more lines. Use the above-mentioned character. Using the
at the end of the line, Python will ignore the line break and will read your code as if it were one line.
(Or in some cases you can take advantage of implicit line joining.)
Python Best Practice #6: Stay consistent
And one of the most important rules: always stay consistent! Even if you follow the above rules, in specific situations you’ll have to create your own. Either way: make sure you are using these rules consistently. Ideally, you have to create Python scripts that you can open 6 months later without any trouble understanding them. If you randomly change your formatting rules and naming conventions, you’ll create an unnecessary headache for your future self. So stay consistent!
The Zen of Python – a nice easter egg
What else could come at the end of this article but a nice Python easter egg.
If you type import this
to your Jupyter Notebook you will get the 19 design “commandments” of Python:
Use these advices wisely! 😉
Conclusion
Well this is it. Follow this advice, and if you want to learn more about Python syntax essentials and best practices, I recommend these articles:
- Google’s Python Style Guide,
- PEP8,
- BOBP guide.
Now go ahead and check out the last article of the series: how to import Python libraries!
- If you want to learn more about how to become a data scientist, take my 50-minute video course: How to Become a Data Scientist. (It’s free!)
- Also check out my 6-week online course: The Junior Data Scientist’s First Month video course.
Python Naming Conventions Variables
Cheers,
Tomi Mester
Maximum Line Length¶
Limit all lines to a maximum of 79 characters.
There are still many devices around that are limited to 80 characterlines; plus, limiting windows to 80 characters makes it possible tohave several windows side-by-side. The default wrapping on suchdevices disrupts the visual structure of the code, making it moredifficult to understand. Therefore, please limit all lines to amaximum of 79 characters. For flowing long blocks of text (docstringsor comments), limiting the length to 72 characters is recommended.
The preferred way of wrapping long lines is by using Python’s impliedline continuation inside parentheses, brackets and braces. Long linescan be broken over multiple lines by wrapping expressions inparentheses. These should be used in preference to using a backslashfor line continuation.
Backslashes may still be appropriate at times. For example, long,multiple with
-statements cannot use implicit continuation, sobackslashes are acceptable:
Python Naming Convention Cheat Sheet
Another such case is with assert
statements.
Make sure to indent the continued line appropriately. The preferredplace to break around a binary operator is after the operator, notbefore it. Some examples: