BogoToBogo
  • Home
  • About
  • Big Data
  • Machine Learning
  • AngularJS
  • Python
  • C++
  • go
  • DevOps
  • Kubernetes
  • Algorithms
  • More...
    • Qt 5
    • Linux
    • FFmpeg
    • Matlab
    • Django 1.8
    • Ruby On Rails
    • HTML5 & CSS

Python string

python_logo




Bookmark and Share





bogotobogo.com site search:

Python string

Python has two build-in types of strings: str holds bytes, and unicode holds Unicode characters. If we only deal with 7-bit ASCII characters (characters in the range of 0-127), we can save some memory by using strs. However, we should be careful if we use an 8-bit character set. In general, it is not always possible simply by examining the bytes to determine which 8-bit encoding is used for a particular string. But the safest way is to use strs for 7-bit ASCII and for raw binary 8-bit bytes, and unicode otherwise.

Note: Good news is that Python 3.x doesn't have a special Unicode string type/class. Every string is a Unicode string.





String Literals

Python strings are fairly easy to use. But there are so many ways to write them in our code:

>>> # Single quotes
>>> print('P"casso')
P"casso

>>> # Double quotes
>>> print("P'casso")
P'casso

>>> # Tripple quotes
>>> print('''...Picasso...''')
...Picasso...

>>> # Escape sequences
>>> print("P\ti\nca\Osso")
P	i
ca\Osso

>>> #Raw strings
>>> print(r"C:\myscript.py")
C:\myscript.py

>>> # Byte strings
>>> print(b'Picas\x01so')
b'Picas\x01so'
>>> type(b'Picas\x01so')
<class 'bytes'>
>>> type('normal_string')
<class 'str'>

>>> # Unicode strings
>>> S = 'A\u00c4B\U000000e8C'
>>> S
'A-B-C'
>>> len(S)
5
>>> 


Quoted Strings

Single and double quote characters are the same.

>>> 'Picasso', "Picasso"
('Picasso', 'Picasso')

The reason for supporting both is that it allows us to embed a quote character of the other variety inside a string without escaping it with a backslash.

>>> 'Mozart"s', "Mozart's"
('Mozart"s', "Mozart's")

Python concatenates adjacent string literals in any expression.

>>> masters = "Mozart " 'and' " Picasso"
>>> masters
'Mozart and Picasso'

If we add commas between these strings, we'll have a tuple not a string.

>>> "Mozart\"s", 'Picasso\'s'
('Mozart"s', "Picasso's")



Escape Sequences

A backslash is representative of a general pattern in strings. Backslashes are used to introduce special byte coding, escape sequences.

Escape sequences let us embed byte codes in strings that cannot easily be type on a keyboard. The character \, and one or more characters following it in the string literal, are replaced with a single character in the resulting string object. The object has the binary value specified by the sequence. For instance, here is a five-character string that embeds a newline and a tab:

>>> s = 'a\nb\tc'

The two characters \n stand for a single character - the byte containing the binary value of the newline character in our character set which is ASCII code 10. The sequence \t is replaced with the tab character. The way this string looks when printed depends on how we print it. While the interactive echo shows the special characters as escapes, but print interprets them instead:

>>> s
'a\nb\tc'
>>> print(s)
a
b	c

We can check how many characters are in the string.

>>> len(s)
5

So, the string is five bytes long. It contains an ASCII a, a new line, an ASCII b, etc. The backslash characters are not really stored with the string in memory. They are just used to tell Python to store special byte values in the string. Here are string backslash characters:

Escape Meaning
\newline Ignored (continuation line)
\\ Backslash (stores one \)
\' Single quotes (stores ')
\" Double quotes (stores ")
\a Bell
\b Backspace
\f Formfeed
\n Newline (linefeed)
\r Carriage return
\t Horizontal tab
\v Vertical tab
\xhh Character with hex value hh (at most 2 digits)
\ooo Character with octal value ooo (up to 3 digits)
\0 Null: binary 0 character (doesn't end string)
\N{ id } Unicode database ID
\uhhhh Unicode 16-bit hex
\Uhhhhhhhh Unicode 32-bit hex
\other Not an escape (keeps both \ and other)

Some escape sequences allow us to embed binary values into the bytes of a string. Here we have five-character string with two binary zeros:

>>> s = 'A\0B\0C'
>>> s
'A\x00B\x00C'

The zero(null) byte does not terminate a string. Instead, Python keeps the string's length and text in memory. Here we have a string with a binary 1 and 2 (in octal) and 3 (hexa):

>>> s = '\001\002\x03'
>>> s
'\x01\x02\x03'
>>> len(s)
3

Here, Python prints out nonprintable characters in hex, regardless of how they are specified. Here we have "Picasso", a tab, a newline, and a zero value coded in hex:

>>> s = "Pi\tcc\nas\x00so"
>>> s
'Pi\tcc\nas\x00so'
>>> print(s)
Pi	cc
as

If Python does not recognize the character after a backslash (\) as an escape code, it simply keeps the backslash in the string:

>>> x = "Picas\so"
>>> x
'Picas\\so'
>>> len(x)
8

As memtioned before, Python 3.x doesn't have a special Unicode string type/class, and very string is a Unicode string. So, we do not need to use unichr() any more, we can just use chr() as in the example below.

>>> uro = chr(8364)
>>> euro
€
>>> ord(euro)
8364
>>> 


Raw String with Escape Sequences

Let's look at the following code for opening a file:

>>> myfile = open('C:\new\text.txt', 'w')
Traceback (most recent call last):
  File ...
    myfile = open('C:\new\text.txt', 'w')
IOError: [Errno 22] Invalid argument: 'C:\new\text.txt'

The problem is that \n is considered as a newline character, and \t as a tab. This is where raw strings can do something. If the letter r (uppercase or lowercase) appears before the opening quote of a string, it suppresses the escape mechanism. The result is that Python keeps our backslash literally. In other words, backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

So, to fix the filename problem, we can just add the letter r:

>>> myfile = open(r'C:\new\text.txt', 'w')

Or, since two backslashes are really an escape sequence for one backslash, we can keep our backslash by doubling them:

>>> myfile = open('C:\\new\\text.txt', 'w')

Actually, we sometimes need to this method when we should print strings with embedded backslashes:

>>> path = r'C:\new\text.txt'
>>> path
'C:\\new\\text.txt'
>>> print(path)
C:\new\text.txt
>>> len(path)
15

As we've seen in numeric representation, the default format at the interactive prompt prints results as they were coded. So, escape backslashes are in the output. The print provides a more user-friendly format that shows that there is actually only on backslash in each spot.

Triple Quotes for Multiline Block Strings

A block string is a string literal format with triple-quotes. It is for coding multiline text data.

>>> Python = """Python aims to combine
"remarkable power
with very clear syntax", and ..."""
>>> Python
'Python aims to combine\n"remarkable power\nwith very clear syntax", and ...'

Though the string spans three lines, Python collects all the triple-quoted text into a single multiline string with embedded newline characters (\n) at the places where our code has line breaks.

If we print it instead of echoing:

>>> print(Python)
Python aims to combine
"remarkable power
with very clear syntax", and ...
>>> 


Indexing and Slicing

We can access strong components by position because strings are order collections of characters.


string_diagram

Python offsets start at 0 and end at one less than the length of the string. It also lets us fetch items from sequences such as strings using negative offsets. A negative offset is added to the length of a string to derive a positive offset. We can also thing of negative offsets as counting backward from the end.

>>> S = 'Picasso'
>>> # Indexing from front and end
>>> S[0], S[-1], S[-4]
('P', 'o', 'a')

>>> # Slicing: extract a section
>>> S[1:3], S[2:], S[:-1]
('ic', 'casso', 'Picass')

The basics of slicing are straightforward. When we index a sequence object such as a string on a pair of offset separated by a colon, Python returns a new object containing the contiguous section. The left offset is taken to be the lower bound (inclusive) and the right is the upper bound (noninclusive). In other words, Python fetches all items from the lower bound up to but not including the upper bound. Then, it returns a new object containing the fetched items. If omitted, the left and right bounds default to o and the length of the object, respectively.

  1. Indexing
    S[i] fetches components at offsets:
    1. The first item is at offset 0.
    2. Negative indexes mean to count backward from the end or right.
    3. S[0] fetches the first item.
    4. S[-2] fetches the second item from the end (same as S[len(S)-2]).
  2. Slicing
    S[i:j] extracts contiguous sections of sequences:
    1. The upper bound is noninclusive.
    2. Slice boundaries default to 0 and the sequence length, if omitted.
    3. S[1:3] fetches items at offsets 1 up to but not including 3.
    4. S[1:] fetches items at offset 1 through the end (the sequence length).
    5. S[:3] fetches items at offset 0 up to but not including 3.
    6. S[:-1]fetches items at offset 0 up to but not including the last item.
    7. S[:] fetches items at offsets o through the end - this effectively performs a top-level copy of S.

The last item is very common trick. It makes a full top-level copy of a sequence object which is an object with the same value but a distinct piece of memory. This isn't very useful for immutable objects like strings but it is very useful for objects that may be changed in-place such as lists.



The Third Limit and Slice Objects

Slice expressions have an optional third index as a step or stride:

X[i:j:k]

That means "extract all the items in X, from offset i through j-1 by k."

>>> S = 'Edsger Dijkstra'
>>> S[1:10:2]
'dgrDj'
>>> S[::2]
'Ese ikta'

A stride of -1 indicates that the slice should go from right to left. The net effect is to reverse the sequence:

>>> S[::-1]
'artskjiD regsdE'

With a negative stride, the meanings of the first two bounds are reversed. In other words, the slice S[5:1:-1] fetches the items from 2 to 5, in reverse order:

>>> S = '01234567'
>>> S[5:1:-1]
'5432'





more



Python tutorial



Python Home

Introduction

Running Python Programs (os, sys, import)

Modules and IDLE (Import, Reload, exec)

Object Types - Numbers, Strings, and None

Strings - Escape Sequence, Raw String, and Slicing

Strings - Methods

Formatting Strings - expressions and method calls

Files and os.path

Traversing directories recursively

Subprocess Module

Regular Expressions with Python

Regular Expressions Cheat Sheet

Object Types - Lists

Object Types - Dictionaries and Tuples

Functions def, *args, **kargs

Functions lambda

Built-in Functions

map, filter, and reduce

Decorators

List Comprehension

Sets (union/intersection) and itertools - Jaccard coefficient and shingling to check plagiarism

Hashing (Hash tables and hashlib)

Dictionary Comprehension with zip

The yield keyword

Generator Functions and Expressions

generator.send() method

Iterators

Classes and Instances (__init__, __call__, etc.)

if__name__ == '__main__'

argparse

Exceptions

@static method vs class method

Private attributes and private methods

bits, bytes, bitstring, and constBitStream

json.dump(s) and json.load(s)

Python Object Serialization - pickle and json

Python Object Serialization - yaml and json

Priority queue and heap queue data structure

Graph data structure

Dijkstra's shortest path algorithm

Prim's spanning tree algorithm

Closure

Functional programming in Python

Remote running a local file using ssh

SQLite 3 - A. Connecting to DB, create/drop table, and insert data into a table

SQLite 3 - B. Selecting, updating and deleting data

MongoDB with PyMongo I - Installing MongoDB ...

Python HTTP Web Services - urllib, httplib2

Web scraping with Selenium for checking domain availability

REST API : Http Requests for Humans with Flask

Blog app with Tornado

Multithreading ...

Python Network Programming I - Basic Server / Client : A Basics

Python Network Programming I - Basic Server / Client : B File Transfer

Python Network Programming II - Chat Server / Client

Python Network Programming III - Echo Server using socketserver network framework

Python Network Programming IV - Asynchronous Request Handling : ThreadingMixIn and ForkingMixIn

Python Coding Questions I

Python Coding Questions II

Python Coding Questions III

Python Coding Questions IV

Python Coding Questions V

Python Coding Questions VI

Python Coding Questions VII

Python Coding Questions VIII

Python Coding Questions IX

Python Coding Questions X

Image processing with Python image library Pillow

Python and C++ with SIP

PyDev with Eclipse

Matplotlib

Redis with Python

NumPy array basics A

NumPy Matrix and Linear Algebra

Pandas with NumPy and Matplotlib

Celluar Automata

Batch gradient descent algorithm

Longest Common Substring Algorithm

Python Unit Test - TDD using unittest.TestCase class

Simple tool - Google page ranking by keywords

Google App Hello World

Google App webapp2 and WSGI

Uploading Google App Hello World

Python 2 vs Python 3

virtualenv and virtualenvwrapper

Uploading a big file to AWS S3 using boto module

Scheduled stopping and starting an AWS instance

Cloudera CDH5 - Scheduled stopping and starting services

Removing Cloud Files - Rackspace API with curl and subprocess

Checking if a process is running/hanging and stop/run a scheduled task on Windows

Apache Spark 1.3 with PySpark (Spark Python API) Shell

Apache Spark 1.2 Streaming

bottle 0.12.7 - Fast and simple WSGI-micro framework for small web-applications ...

Flask app with Apache WSGI on Ubuntu14/CentOS7 ...

Fabric - streamlining the use of SSH for application deployment

Ansible Quick Preview - Setting up web servers with Nginx, configure enviroments, and deploy an App

Neural Networks with backpropagation for XOR using one hidden layer

NLP - NLTK (Natural Language Toolkit) ...

RabbitMQ(Message broker server) and Celery(Task queue) ...

OpenCV3 and Matplotlib ...

Simple tool - Concatenating slides using FFmpeg ...

iPython - Signal Processing with NumPy

iPython and Jupyter - Install Jupyter, iPython Notebook, drawing with Matplotlib, and publishing it to Github

iPython and Jupyter Notebook with Embedded D3.js

Downloading YouTube videos using youtube-dl embedded with Python

Machine Learning : scikit-learn ...

Django 1.6/1.8 Web Framework ...








Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization

YouTubeMy YouTube channel

Sponsor Open Source development activities and free contents for everyone.

Thank you.

- K Hong







Python tutorial



Python Home

Introduction

Running Python Programs (os, sys, import)

Modules and IDLE (Import, Reload, exec)

Object Types - Numbers, Strings, and None

Strings - Escape Sequence, Raw String, and Slicing

Strings - Methods

Formatting Strings - expressions and method calls

Files and os.path

Traversing directories recursively

Subprocess Module

Regular Expressions with Python

Regular Expressions Cheat Sheet

Object Types - Lists

Object Types - Dictionaries and Tuples

Functions def, *args, **kargs

Functions lambda

Built-in Functions

map, filter, and reduce

Decorators

List Comprehension

Sets (union/intersection) and itertools - Jaccard coefficient and shingling to check plagiarism

Hashing (Hash tables and hashlib)

Dictionary Comprehension with zip

The yield keyword

Generator Functions and Expressions

generator.send() method

Iterators

Classes and Instances (__init__, __call__, etc.)

if__name__ == '__main__'

argparse

Exceptions

@static method vs class method

Private attributes and private methods

bits, bytes, bitstring, and constBitStream

json.dump(s) and json.load(s)

Python Object Serialization - pickle and json

Python Object Serialization - yaml and json

Priority queue and heap queue data structure

Graph data structure

Dijkstra's shortest path algorithm

Prim's spanning tree algorithm

Closure

Functional programming in Python

Remote running a local file using ssh

SQLite 3 - A. Connecting to DB, create/drop table, and insert data into a table

SQLite 3 - B. Selecting, updating and deleting data

MongoDB with PyMongo I - Installing MongoDB ...

Python HTTP Web Services - urllib, httplib2

Web scraping with Selenium for checking domain availability

REST API : Http Requests for Humans with Flask

Blog app with Tornado

Multithreading ...

Python Network Programming I - Basic Server / Client : A Basics

Python Network Programming I - Basic Server / Client : B File Transfer

Python Network Programming II - Chat Server / Client

Python Network Programming III - Echo Server using socketserver network framework

Python Network Programming IV - Asynchronous Request Handling : ThreadingMixIn and ForkingMixIn

Python Coding Questions I

Python Coding Questions II

Python Coding Questions III

Python Coding Questions IV

Python Coding Questions V

Python Coding Questions VI

Python Coding Questions VII

Python Coding Questions VIII

Python Coding Questions IX

Python Coding Questions X

Image processing with Python image library Pillow

Python and C++ with SIP

PyDev with Eclipse

Matplotlib

Redis with Python

NumPy array basics A

NumPy Matrix and Linear Algebra

Pandas with NumPy and Matplotlib

Celluar Automata

Batch gradient descent algorithm

Longest Common Substring Algorithm

Python Unit Test - TDD using unittest.TestCase class

Simple tool - Google page ranking by keywords

Google App Hello World

Google App webapp2 and WSGI

Uploading Google App Hello World

Python 2 vs Python 3

virtualenv and virtualenvwrapper

Uploading a big file to AWS S3 using boto module

Scheduled stopping and starting an AWS instance

Cloudera CDH5 - Scheduled stopping and starting services

Removing Cloud Files - Rackspace API with curl and subprocess

Checking if a process is running/hanging and stop/run a scheduled task on Windows

Apache Spark 1.3 with PySpark (Spark Python API) Shell

Apache Spark 1.2 Streaming

bottle 0.12.7 - Fast and simple WSGI-micro framework for small web-applications ...

Flask app with Apache WSGI on Ubuntu14/CentOS7 ...

Selenium WebDriver

Fabric - streamlining the use of SSH for application deployment

Ansible Quick Preview - Setting up web servers with Nginx, configure enviroments, and deploy an App

Neural Networks with backpropagation for XOR using one hidden layer

NLP - NLTK (Natural Language Toolkit) ...

RabbitMQ(Message broker server) and Celery(Task queue) ...

OpenCV3 and Matplotlib ...

Simple tool - Concatenating slides using FFmpeg ...

iPython - Signal Processing with NumPy

iPython and Jupyter - Install Jupyter, iPython Notebook, drawing with Matplotlib, and publishing it to Github

iPython and Jupyter Notebook with Embedded D3.js

Downloading YouTube videos using youtube-dl embedded with Python

Machine Learning : scikit-learn ...

Django 1.6/1.8 Web Framework ...


Sponsor Open Source development activities and free contents for everyone.

Thank you.

- K Hong






OpenCV 3 image and video processing with Python



OpenCV 3 with Python

Image - OpenCV BGR : Matplotlib RGB

Basic image operations - pixel access

iPython - Signal Processing with NumPy

Signal Processing with NumPy I - FFT and DFT for sine, square waves, unitpulse, and random signal

Signal Processing with NumPy II - Image Fourier Transform : FFT & DFT

Inverse Fourier Transform of an Image with low pass filter: cv2.idft()

Image Histogram

Video Capture and Switching colorspaces - RGB / HSV

Adaptive Thresholding - Otsu's clustering-based image thresholding

Edge Detection - Sobel and Laplacian Kernels

Canny Edge Detection

Hough Transform - Circles

Watershed Algorithm : Marker-based Segmentation I

Watershed Algorithm : Marker-based Segmentation II

Image noise reduction : Non-local Means denoising algorithm

Image object detection : Face detection using Haar Cascade Classifiers

Image segmentation - Foreground extraction Grabcut algorithm based on graph cuts

Image Reconstruction - Inpainting (Interpolation) - Fast Marching Methods

Video : Mean shift object tracking

Machine Learning : Clustering - K-Means clustering I

Machine Learning : Clustering - K-Means clustering II

Machine Learning : Classification - k-nearest neighbors (k-NN) algorithm




Machine Learning with scikit-learn



scikit-learn installation

scikit-learn : Features and feature extraction - iris dataset

scikit-learn : Machine Learning Quick Preview

scikit-learn : Data Preprocessing I - Missing / Categorical data

scikit-learn : Data Preprocessing II - Partitioning a dataset / Feature scaling / Feature Selection / Regularization

scikit-learn : Data Preprocessing III - Dimensionality reduction vis Sequential feature selection / Assessing feature importance via random forests

Data Compression via Dimensionality Reduction I - Principal component analysis (PCA)

scikit-learn : Data Compression via Dimensionality Reduction II - Linear Discriminant Analysis (LDA)

scikit-learn : Data Compression via Dimensionality Reduction III - Nonlinear mappings via kernel principal component (KPCA) analysis

scikit-learn : Logistic Regression, Overfitting & regularization

scikit-learn : Supervised Learning & Unsupervised Learning - e.g. Unsupervised PCA dimensionality reduction with iris dataset

scikit-learn : Unsupervised_Learning - KMeans clustering with iris dataset

scikit-learn : Linearly Separable Data - Linear Model & (Gaussian) radial basis function kernel (RBF kernel)

scikit-learn : Decision Tree Learning I - Entropy, Gini, and Information Gain

scikit-learn : Decision Tree Learning II - Constructing the Decision Tree

scikit-learn : Random Decision Forests Classification

scikit-learn : Support Vector Machines (SVM)

scikit-learn : Support Vector Machines (SVM) II

Flask with Embedded Machine Learning I : Serializing with pickle and DB setup

Flask with Embedded Machine Learning II : Basic Flask App

Flask with Embedded Machine Learning III : Embedding Classifier

Flask with Embedded Machine Learning IV : Deploy

Flask with Embedded Machine Learning V : Updating the classifier

scikit-learn : Sample of a spam comment filter using SVM - classifying a good one or a bad one




Machine learning algorithms and concepts

Batch gradient descent algorithm

Single Layer Neural Network - Perceptron model on the Iris dataset using Heaviside step activation function

Batch gradient descent versus stochastic gradient descent

Single Layer Neural Network - Adaptive Linear Neuron using linear (identity) activation function with batch gradient descent method

Single Layer Neural Network : Adaptive Linear Neuron using linear (identity) activation function with stochastic gradient descent (SGD)

Logistic Regression

VC (Vapnik-Chervonenkis) Dimension and Shatter

Bias-variance tradeoff

Maximum Likelihood Estimation (MLE)

Neural Networks with backpropagation for XOR using one hidden layer

minHash

tf-idf weight

Natural Language Processing (NLP): Sentiment Analysis I (IMDb & bag-of-words)

Natural Language Processing (NLP): Sentiment Analysis II (tokenization, stemming, and stop words)

Natural Language Processing (NLP): Sentiment Analysis III (training & cross validation)

Natural Language Processing (NLP): Sentiment Analysis IV (out-of-core)

Locality-Sensitive Hashing (LSH) using Cosine Distance (Cosine Similarity)




Artificial Neural Networks (ANN)

[Note] Sources are available at Github - Jupyter notebook files

1. Introduction

2. Forward Propagation

3. Gradient Descent

4. Backpropagation of Errors

5. Checking gradient

6. Training via BFGS

7. Overfitting & Regularization

8. Deep Learning I : Image Recognition (Image uploading)

9. Deep Learning II : Image Recognition (Image classification)

10 - Deep Learning III : Deep Learning III : Theano, TensorFlow, and Keras









Contact

BogoToBogo
contactus@bogotobogo.com

Follow Bogotobogo

About Us

contactus@bogotobogo.com

YouTubeMy YouTube channel
Pacific Ave, San Francisco, CA 94115

Pacific Ave, San Francisco, CA 94115

Copyright © 2024, bogotobogo
Design: Web Master