Skip to content

Python

Resources

Notes and Troubleshooting

Built-in Functions

input()

The Python input() prompts the user to enter a value and returns user input as a string.

Using input() with text
user_input = input('<TEXT PROMPT FOR USER>: ')
print(user_input)
Using input() with numeric values
# Cast the string input to an integer
user_input = int(input('<TEXT PROMPT FOR USER>: '))
print(user_input)

type() - Checking the datatype of an object or variable

datatype = type(<OBJECT>)
print(datatype)

Constructing Reproducible Filepaths in Python

Paths are constructed differently on different operating systems:

  • Mac and Linux: /home/username/foldername/data
  • Windows: C:\\Users\\username\\foldername\\data

The main differences are identification of the home directory /home/username on Linux and Max and the number and direction of slashes on Windows \\.

Python's os module contains a number of functions for working with functionality that is operating system dependent. The os.path module enables the construction and manipulation of file paths.

Import os package
import os
Create a path
my_path = os.path.join("dir1", "dir2", ..., "filename")

This function takes an arbitrary number of directories and creates a path in the format required by the operating system of the system it is running on.

NOTE: Relative paths will only return True if a working directory has been set correctly.

Check that the path exists
os.path.exists(my_path)

Function returns True or False.

Get working directory
os.getcwd()

Function returns a path.

Set working directory
chdir("path-to-dir")

Function, short for 'CHange DIRectory', allows you to set a specific path as the current working directory.

Create a directory if it does't already exist
os.mkdir(my_path)

This function will only work across operating systems, if you construct the path with os.path.join().

NOTE: The code above will fail if the directory my_path already exists. A conditional stamenet will be required to handle the exception.

JSON - Python's built-in JSON encoder and decoder

Documentation: https://docs.python.org/3/library/json.html

Import JSON package
import json
JSON Formatting for printing dictionaries
print(json.dumps(<Dictionary>, indent=4))
Flatten JSON in Python

A custom package is available via PyPi: https://pypi.org/project/flatten-json/

pip install flatten_json

This package implements and expands on the following function which can be used independently: https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10

def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out
Write dictionary to JSON file
with open ("output.json", "w") as outfile:
    json.dump(dictionary, outfile)

Pandas

The Pandas library provides functions for manipulating, indexing and analysing tables of data with Python. The main data structure used by Pandas is the dataframe.

List the column names of a dataframe

list(df.columns)

Count of NaN values in a dataframe by column

df.isna().sum()

Rename dataframe columns

df.rename(columns={"A": "a", "B": "c"})

Pip Install Python Package from GitHub

The following commands are entered using the computer's command line interface:

Pip Install Git Syntax

pip install git+https://github.com/<owner_name>/<repo_name>.git

Pip Install Git Branch

pip install https://git+github.com/<owner_name>/<repo_name>.git@<branch_name>

Postgres with Python

Loading data from a Pandas dataframe into Postgres using Psycopg

Here the psycopg PostgreSQL driver is used to pass a sequence of values as parameters %s into an SQL statement and execute it to upload the data from a Pandas dataframe.

import psycopg2
from psycopg2 import extras

...

# Create a list of tupples from the dataframe values  
tpls = [tuple(x) for x in df.to_numpy()]  

# Create a comma separated list of column names
cols = ','.join(list(df.columns))  

# SQL query using parameters 
sql = "INSERT INTO %s(%s) VALUES %%s" % (table, cols)  
cursor = conn.cursor()

# Run parameterized query with execute_values() to load data
try:
    extras.execute_values(cursor, sql, tpls)  
    print("Data inserted using execute_values() successfully..")  
except (Exception, psycopg2.DatabaseError) as err:  
    # pass exception to function  
    show_psycopg2_exception(err)  
    cursor.close()

Note: From Python 3.6 onwatd the query in this example can also be formatted using f-strings. Doing so makes the code more readable and removes the need to use the first percentage sign % as an escape character for the second in the original argument VALUES %%s:

# SQL query using parameters 
sql = f"INSERT INTO {table_name}({cols}) VALUES %s" 
cursor = conn.cursor()

Source: Part 4 !! Pandas DataFrame to PostgreSQL using Python

Python Environments

Virtual environments provide a way of managing a self-contained Python installation and collection of package dependencies for a specific project.

Create a Virtual Environment with venv

  1. Open a command terminal.
  2. Navigate to your project folder and run the following command:
python -m venv venv

This will create a virtual environment called venv within your project.

  1. Activate the environment by running:

On Windows, run:

venv\Scripts\activate.bat

On Unix or MacOS, run:

source venv/bin/activate

Generate a Requirements File

Navigate to the root of the Python project's folder and run the following command:

pip freeze -l > requirements.txt 

This will write the local project dependencies to a file called requirements.txt.

Install Required Packages from a Requirements File

Navigate to the root of the Python project's folder. Then copy your requirements.txt to that folder and run the following command:

pip install -r requirements.txt

The required packages will now be installed.

String Quotes

In Python, single-quoted ' strings and double-quoted " strings are treated the same. PEP 8 does not make a particular recommendation which confuses users: - PEP 8 doesn’t make a recommendation on whether to use single or double quotes. Pick a rule and stick to it. - When a string is surrounded with single quotes, use double quotes inside it to avoid backslashes. - When a string is surrounded with double quotes, use single quotes inside it to avoid backslashes. - When using triple quoted strings, always use double quote characters inside it.

However, there are some suggested best practices: - Single quoted strings: - Make sure the string is somewhat short, or you’re dealing with a string literals - Make sure there are no single quotations inside the string, as adding escape characters has a toll on readability. - Double quoted strings: - Use double quotes for text and string interpolation. - Use double quotes when there’s a quotation inside a string - you can easily surround the quotation with single quotes. - Triple quoted strings: - You can use both single and double quotes inside them. - You can split the string into multiple lines. - They are considered as a best practice when writing docstrings.

Summary: -  Users typically use single quotes for data (sting literals) and double quotes for human-readable strings.

Additional Notes: - Single quoted strings can cause issues if there are quotations inside the string. - Double quoted strings are a marginally safer bet. The Black formatter for example prefers double quotes and enforces these as standard.

Sources: - PEP 8: https://peps.python.org/pep-0008/#string-quotes - Single quotes vs. double quotes in Python: https://stackoverflow.com/questions/56011/single-quotes-vs-double-quotes-in-python - Black Formatter: https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#strings

Scripts, Modules, Packages and Libraries

  • Script - A Python file that’s intended to be run directly. When you run it, it should do something. This means that scripts can contain code written outside the scope of any classes or functions.
  • Module - A Python file that’s intended to be imported into scripts or other modules. It often defines it's own classes, functions, and variables which are intended to be used in other files that import it.
  • Package - A collection of related modules that work together to provide certain functionality. These modules are contained within a folder and can be imported just like any other modules. This folder will often contain a special __init__ file that tells Python it’s a package that might contain more modules nested within subfolders
  • Library - A collection of code that could incorporate tens or even hundreds of individual modules which perform different functions.

Source: Scripts, Modules, Packages, and Libraries

String Formatting with f-Strings

  • Python 3 introduced formatted string literals or 'f-strings'.
  • These are faster, less verbose and easier to read than earlier methods for formatting strings:
    • %-formatting
    • Plus sign '+' to concatenate strings
    • str.format() method
  • Example:
name = "Virtual Architectures"
age = 28
f"Hello, {name}. You are {age} years old."
  • Output:
'Hello, Virtual Architectures. You are 28 years old.'
  • You can put any valid Python expressions in an f-string which means you can also use them to call functions and their methods.
  • Warning! You can use different types of quotation marks inside the expressions. Just make sure you are not using the same type of quotation mark on the outside of the f-string as you are using in the expression.
  • Working with dictionaries - If you are going to use single quotation marks for the keys of the dictionary, then remember to make sure you’re using double quotation marks for the f-strings containing the keys.

Source: Python 3's f-Strings: An Improved String Formatting Syntax (Guide)

Timer for Code Execution

import time

# Get start time
st = time.time()

# ENTER YOUR CODE HERE

# Calculate elapsed time
elapsed_time = time.time() - st
print('Execution time:', time.strftime("%H:%M:%S", time.gmtime(elapsed_time)))

Source: Python Measure the Execution Time of a Program