python programming FAQ

来源于python文档的Programmming FAQ, 单纯的翻译过来, 有些地方还需要进一步深入.

Core Language

why am i getting an `UnboundLocalError` when the cariable has a value?

In [1]: x = 10

In [2]: def bar():
   ...:     print(x)
   ...:

In [3]: bar()
10

In [4]: def foo():
   ...:     print(x)
   ...:     x+=1
   ...:

In [5]: foo()
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-5-c19b6d9633cf> in <module>()
----> 1 foo()

<ipython-input-4-baf15b7aced1> in foo()
      1 def foo():
----> 2     print(x)
      3     x+=1
      4

UnboundLocalError: local variable 'x' referenced before assignment

In [6]: def foo():
   ...:     print(x)
   ...:     x= x+1
   ...:
   ...:

In [7]: foo()
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-7-c19b6d9633cf> in <module>()
----> 1 foo()

<ipython-input-6-1fe44c88aa4d> in foo()
      1 def foo():
----> 2     print(x)
      3     x= x+1
      4
      5

UnboundLocalError: local variable 'x' referenced before assignment

可以理解为, 当你对作用域中的变量进行赋值的时候, 这个变量将作为该作用域的局部变量, 并在外部作用域中对任何具有相同名称的变量进行屏蔽.
在后俩失败的foo()函数中, 由于要对x进行赋值, 编译器就认为x是一个局部变量, 但是你在函数中并没有对x进行声明, 所以就报错咯


In [1]: x = 1

In [2]: def foo():
   ...:     print(x)
   ...:     n = x + 1
   ...:     print(x)

In [3]: foo()
1
1

In [4]: def foo():
    print(x)
    n = x + 1
    print(n)


In [5]: foo()
1
2

In [6]: def foo():
    print(x)
    n = x + 1
    print(n)
    x -= 1

In [7]: foo()
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-7-624891b0d01a> in <module>()
----> 1 foo()

<ipython-input-6-2374c4171c35> in foo()
      1 def foo():
----> 2     print(x)
      3     n = x + 1
      4     print(n)
      5     x -= 1

UnboundLocalError: local variable 'x' referenced before assignment

TODO:

在python中, 每一个对象创建的时候, 都有一个标示, 也就是变量的名称, 一个类型和一个值.

可不可以理解为, x = x + 1是: 新建一个变量x, 这时候x已经覆盖了外部的x变量, 但是还没有赋值,然后等号右边对x进行赋值, 这时候x是一个声明但没有赋值的对象, 然后就报错???

那如果我想在函数中对全局变量操作个需要怎么办?

利用golbal关键字

In [15]: x
Out[15]: 1

In [16]: def fooa():
   ....:     global x
   ....:     print(x)
   ....:     x += 1
   ....:     print(x)
   ....:     

In [17]: fooa()
1
2

In [18]: x
Out[18]: 2

利用nonlocal关键字

In [10]: def foobar():
   ....:     x = 10
   ....:     def bar():
   ....:         nonlocal x
   ....:         print(x)
   ....:         x += 1
   ....:     bar()
   ....:     print(x)
   ....:     

In [11]: foobar()
10
11

In [13]: def foobar():
    x = 10
    def bar():
        #nonlocal x
        print(x)
        x += 1
    bar()
    print(x) 

In [14]: foobar()

UnboundLocalError: local variable 'x' referenced before assignment

In [22]: x
Out[22]: 2

In [23]: def foobar():
    global x
    def bar():
        nonlocal x
        print(x)
        x = 3
    bar()
    print(x)

SyntaxError: no binding for nonlocal 'x' found
# nonlocal不能和global绑定的名称重复

What are the rules for local and global variables in Python

In [27]: def fooa():
    global p    
    print(p)
    p += 1
    print(p)
   ....:     

In [28]: fooa()
NameError: name 'p' is not defined

在Python中，仅在函数内引用的变量隐式地为全局变量。如果一个变量在函数体内的任何位置被赋值，它被认为是一个局部变量，除非明确声明为全局变量。

尽管起初有点令人惊讶，但一时的考虑解释了这一点。一方面，要求分配变量的全局值可以防止出现意想不到的副作用。另一方面，如果所有全局引用都需要全局，则您将始终使用全局。您必须声明为全局的每个对内置函数的引用或对导入模块的组件。这种混乱将破坏全球声明确定副作用的有效性。

Why do lambdas defined in a loop with different values all return the same result?

In [29]: squares = []

In [30]: for x in range(4):
   ....:     squares.append(lambda: x**2)

In [31]: squares
Out[31]: 
[<function __main__.<lambda>>,
 <function __main__.<lambda>>,
 <function __main__.<lambda>>,
 <function __main__.<lambda>>]

In [32]: squares[2]()
Out[32]: 9

In [34]: squares[3]()
Out[34]: 9

上面会隐式的创建一个变量x:

In [1]: squares = []

In [2]: for x in range(4):
    squares.append(lambda: x**2)
   ...:     

In [3]: x
Out[3]: 3

x不是lambda函数内部的变量, 而是在外部定义的, 然后x最终的值是3, 所以所有的函数都返回3**2:

In [35]: x = 10

In [36]: squares[3]()
Out[36]: 100

如果想要值改变, 可以声明一个新的变量n, 然后将x的值赋给n:

1
2
3

>>> squares = []
>>> for x in range(5):
...     squares.append(lambda n=x: n**2)

# config.py
x = 0

# mod.py
import config
config.x = 1

# main.py
import config
import mod
print(config.x) # 1

# main2.py
import config
print(config.x) # return  0

Why are default values shared between objects?

考虑一个不看文档的人或者大神写的一个函数:

In [9]: def foo(a, b=[]):
   ...:     for i in range(a):
   ...:         b.append(i**2)
   ...:     print(b)

In [10]: foo(3)
[0, 1, 4]

In [11]: foo(3, [1,2,3])
[1, 2, 3, 0, 1, 4]

In [12]: foo(4)
[0, 1, 4, 0, 1, 4, 9]

help(foo)

Help on function foo in module __main__:

foo(a, b=[0, 1, 4, 0, 1, 4, 9])

看到了没, 当调用两次foo后, b的默认值已经变为了一大堆.

这是因为, 在python中, 当函数定义后, 默认值只创建一次, 如果默认值对象发生改变, 那么该函数的后续调用将引用这个已经更改的
对象.

根据定义, 诸如数字, 字符串, 元祖和None等不可变对象可以免于更改. 对于可变对象, 例如字典或者列表, 类实例等
的更改可能会导致混淆.

所以, 使用不可变对象作为默认值是一个好习惯, 所以上面可以写为:

def foo(a=0, b=None):
    b = []
    ...
    print(b)

也不是说说上面第一个写的一定是错的, 存在即是合理对吧, 你可以用这个特性做一些好玩的事情, 比如
写一个函数的缓存:

def foo(arg1, arg2, _cache={}):
    cache_key = cache_key_func(*args)
    try:
        return _cache["cache_key"]
    expect KeyError:
        response = func(*args)
        _cache['cache_key'] = response
        return _cache["cache_key"]
    else:
        raise

当然, 你也可以声明一~个全局变量做这些事情, 这么写没有对错, 你高兴就好.

How can i pass optional or keyword parameters from one func to another?

你可以用*arg和**kwargs两个关键字啦，

*args将位置参数作为一个tuple传进来
**kwagrs将参数作为一个字典传进来

def f(x, *args, **kwargs):
    kwargs["width"] = "23.1"
    ...
    g(x, *args, **kwargs)
    ...

上面这个, 神似装饰器吧, 其实装饰器也就这个作用

What is the difference between arguments and parameters?

parameters 是定义在函数中使用到的参数的名字, 或者说是参数的类型
arguments 是实际传入的value

1
2
3

def func(foo, bar=None, **kwars):
    pass
func(32, bar=33, extra=somevar)

上面中, foo, bar, kwargs是parameters, 32, 33, somevae是arguments

Why did changing list ‘y’ also change list ‘x’?

In [18]: x = []

In [19]: y= x

In [20]: x.append(2)

In [21]: y
Out[21]: [2]

In [22]: x
Out[22]: [2]

导致这个问题有两个原因:

1. 变量仅仅是一个对象引用的名字, 当执行y=x的时候, 并没有创建一个list的复制, 而是创建了一个新de
  引用, 这个引用就是x引用的那个对象, 也就是说, xy都指向相同的一个对象, id是相同的
2.lists是可变的, 也就是说, 可以改变他们的内容

再看一个例子:

 In [28]: x = 5

In [29]: y = x

In [30]: x = 6

In [31]: y
Out[31]: 5

这是因为integers是不可变对象, 当我们执行x=6的时候不是将整数5更新为6, 而是创建
了一个新的整数对象, 值是6, 然后把它赋值给x, 也就是说现在我们总共创建了两个对象, 然后xy两个对象
分别引用不同的对象:

In [32]: id(x)
Out[32]: 10919584

In [33]: id(y)
Out[33]: 10919552

一些行为, 例如y.append(10)或者y.sort()会在原有的对象上进行更改, 但是
一些相似的行为会创建一个新的对象, 例如y = y + [1,2,3]或者sorted(y),
一个简单的方法分辨这两种对象是根据返回值来判断, 如果这个操作返回的是None, 那么
就是在原有的对象上进行的操作.

总的来说:

如果对可变对象进行操作, 我们可以用一些特定的操作来更新它然后让所有的对象都看到相同的value
对于不可变对象, 例如int, str, tuple等, 所有引用它的变量都能看到相同的值, 但是更新这个对象的操作总会
返回一个新的对象

###　How do I write a function with output parameters (call by reference)?

记住, 在python中, 是参数是通过赋值传递的, 也就是说, 参数传进来的只是一个value,
由于赋值只是创建对象的引用，所以在调用者和被调用者之间的参数名称之间没有别名，因此没有本身的引用调用。您可以通过多种方式达到预期效果。

by returning a typle of the results:

def func(a, b):
    a = "new_value"
    b = b + 1
    return a, b

x, y = "old-value", 99
x, y = func(x, y)
print(x, y) # new_value, 100

这个是最清晰也是最常用的方法

通过全局变量, 但是这个不是线性安全的, 不推荐
通过传递一个可变对象

In [39]: def func(a):
   ....:     a.append(2)
   ....:     

In [40]: a
Out[40]: []

In [41]: func(a)

In [42]: a
Out[42]: [2]

传入一个字典更改

def func3(args):
    args['a'] = 'new-value'     # args is a mutable dictionary
    args['b'] = args['b'] + 1   # change it in-place

args = {'a': 'old-value', 'b': 99}
func3(args)
print(args['a'], args['b'])

把参数当成类的属性:

# 这个方法貌似在django的中间件还是哪里用过
class callByRef:
    def __init__(self, **args):
        for (key, value) in args.items():
            setattr(self, key, value)

def func4(args):
    args.a = 'new-value'        # args is a mutable callByRef
    args.b = args.b + 1         # change object in-place

args = callByRef(a='old-value', b=99)
func4(args)
print(args.a, args.b)

How do you make a higher order function in Python?

你可以创建一个嵌套的作用域或者利用python的__call__(self)方法:

# 嵌套作用域, 装饰器啦
def linear(a ,b):
    def result(x):
        return a*x + b
    return result

# __call__方法
class Linear:
    def __init__(a, b):
        self.a, self.b = a, b

    def __call__(self, x):
        return self.a*x + self.b

dd = Linear(2, 3)
dd(3)

TODO: 对象可以封装几种方法的状态, 这句看不懂什么意思

class Counter:
    
    value = 0

    def up(self):
        self.value += 1

    def sets(self, x):
        self.value = x

    def down(self):
        self.value -= 1
count = Counter()
sets, up, down = count.sets, up, down # 在scrapy的download middleware中, 就是这么处理的

How do I copy an object in Python?

通常的, 可以试着用copy()或者deepcopy()方法:


newdict = olddict.copy()

# 对于序列对象还可以用索引来做
new_list = oldlist[:]

Is there an equivalent of C’s “?:” ternary operator ?

在python中有这样的定义:on_true if expression else on_false:

1 2	x, y = 10, 20 small = x if x < y else y

但是, 这么做可能会引发一些问题, 尤其是当前面的x返回的不是bool而是一些异常的时候, 所以最好用...if ...slse..这种形式

How do i modify a string in place?

you can’t, because strings are immutable. In most situations, you should simply construct a
new strings from the various parts you want to assemble it from,
但是如果你想原址的修改unicode对象, 那么可以试试io.StringIO模块或者array模块:

In [1]: import io

In [2]: s = "hellow world"

In [3]: sio = io.StringIO(s)

In [4]: sio
Out[4]: <_io.StringIO at 0x1dbd77b6f78>

In [5]: sio.getvalue()
Out[5]: 'hellow world'

In [6]: sio.seek()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-0007ce4a479a> in <module>()
----> 1 sio.seek()

TypeError: seek() takes at least 1 argument (0 given)

In [7]: sio.seek(3)
Out[7]: 3

In [8]: sio.write("there")
Out[8]: 5

In [9]: sio.getvalue()
Out[9]: 'helthereorld'

In [10]: import array

In [11]: a = array.array("u", s)

In [12]: print(a)
array('u', 'hellow world')

In [13]: a[0] = "y"

In [14]: print(a)
array('u', 'yellow world')

In [15]: a.tounicode()
Out[15]: 'yellow world'

How do I use strings to call functions/methods?

这个方法就多了:

最好的方法是用一个字典映射字符串和函数, 这种方法好的一点是不用一定将字符串
和函数名字一样:

def a():
    pass

def b():
    pass

dispath = {"go":a, "stop":b}
dispath[get_input()]() # 执行输入的字符串那个函数

用内置的函数getattr()

1 2	import foo getattr(foo, "bar")

请注意，getattr（）适用于任何对象，包括类，类实例，模块等。

这用于标准库中的几个地方，如下所示：

class Foo:
    def do_foo(self):
        ...

    def do_bar(self):
        ...

f = getattr(foo_instance, "do_"+opname)
f()

用locals()或者eval()函数

def myfunc():
    print("hello")

fname = "myfunc"
f = locals()[fname]
'''
In [21]: help(locals)
Help on built-in function locals in module builtins:

locals()
    Return a dictionary containing the current scope's local variables.
    # 返回包含当前作用域局部变量的字典。
    NOTE: Whether or not updates to this dictionary will affect name lookups in
    the local scope and vice-versa is *implementation dependent* and not
    covered by any backwards compatibility guarantees.
'''
In [22]: f = eval(fname)

In [23]: f
Out[23]: <function __main__.myfunc>

In [24]: f()
hello

In [25]: help(eval)
Help on built-in function eval in module builtins:

eval(source, globals=None, locals=None, /)
    Evaluate the given source in the context of globals and locals.

    The source may be a string representing a Python expression
    or a code object as returned by compile().
    The globals must be a dictionary and locals can be any mapping,
    defaulting to the current globals and locals.
    If only globals is given, locals defaults to it.

虽然说存在即是合理, 但是不知道在哪里用eval这个函数, 不推荐用

What is the most efficient way to concatenate many strings together?

str和bytes对象是不可变的，因此将许多字符串连接在一起效率不高，因为每个连接都会创建一个新对象。在一般情况下，总运行时间成本在总字符串长度中是二次方的。

To accumulate many str objects, the recommended idiom is to place them into a list and call str.join() at the end:

chunks = []
for s in my_strings:
    chunks.append(s)
result = ''.join(chunks)
# 当然, 别忘了`io.StringIO`这个模块

To accumulate many bytes objects, the recommended idiom is to extend a bytearray object using in-place concatenation (the += operator):

1
2
3

result = bytearray()
for b in my_bytes_objects:
    result += b

How do I iterate over a sequence in reverse order?

reversed()函数

1 2	for x in reversed(sequence): ... # do somethings with x

利用序列的索引

1 2	for x in sequence[::-1]: ... # do somethings with x

How do you remove duplicates from a list?

如果不关心原来列表的顺序, 那么可以先对列表排序然后删除重复的value:

if mylist:
    mylist.sort()
    last = mylist[-1]
    for i in range(len(mylist)-2, -1, -1):
        if last == mylist[i]:
            del mylist[i]
        else:
            last = mylist[i]

如果列表中所有的value都是hashable的, 那么可以直接调用mylist = list(set(mylist))

How do you make an array in Python?

用列表代替啦, python的列表和c的不同在于, python的列表中的项可以是不同的类型.

当然, 也可以用内置的array模块来创建一个数组, 但是它的索引会比list慢.

How do I create a multidimensional list?

你肯定会想到这种骚操作:

In [26]: a = [[None] * 2] * 3

In [27]: a
Out[27]: [[None, None], [None, None], [None, None]]

In [28]: a[0][0] = 2

In [29]: a
Out[29]: [[2, None], [2, None], [2, None]]

很伤心对吧, 直到为啥吗?
原因是用*复制列表不会创建副本，它只会创建对现有对象的引用。
* 3创建一个列表，其中包含3个对长度为2的相同列表的引用。
一行中的更改将显示在所有行中，这几乎肯定不是您想要的。

你知道有个库叫numpy吗? 不知道吧, 去看看吧!!!

Why does a_tuple[i] += [‘item’] raise an exception when the addition works?¶

看看上面写的可变对象和不可变对象的区别, tuple是不可变对象, 所以用list代替吧

How can I sort one list by values from another list?

把他们合并成一个元组, 然后对结果列表进行排序, 然后再解压

In [30]: list1 = ["what", "I'm", "sorting", "by"]

In [31]: list2 = ["something", "else", "to", "sort"]

In [32]: pairs = zip(list1, list2)

In [33]: pairs = sorted(pairs)

In [34]: pairs
Out[34]: [("I'm", 'else'), ('by', 'sort'), ('sorting', 'to'), ('what', 'something')]

In [35]: result = [x[1] for x in pairs]

In [36]: result
Out[36]: ['else', 'sort', 'to', 'something']

An alternative for the last step is:

1 2	>>> result = [] >>> for p in pairs: result.append(p[1])

如果你觉得这更清晰，你可能更喜欢用这个来代替最终的列表理解。
然而，长列表的速度几乎是其两倍。为什么？
首先，append()操作必须重新分配内存，
并且尽管每次都使用一些技巧来避免这样做，但它仍然需要偶尔执行，
而且花费相当多。其次，表达式result.append需要一个额外的属性查询，
第三，从完成所有这些函数调用中减少速度。

My class defunes del but it is not called when i delete the object

这有几个可能的原因。

del语句不一定会调用__del __(), 它只是递减对象的引用计数，并且如果该值达到零，则调用__del __()。
如果您的数据结构包含循环链接（例如，每个孩子都有一个父引用并且每个父代都有一个孩子列表的树），
引用计数将永远不会回到零。偶尔，Python运行一种算法来检测这样的周期，
但是垃圾收集器可能会在最后一次对数据结构的引用消失后运行一段时间，因此可能会在不方便且随机的时间调用__del __()方法。
如果您试图重现问题，这很不方便。更糟糕的是，对象的__del __()方法的执行顺序是任意的。
您可以运行gc.collect()来强制收集，但有病理情况下永远不会收集对象。
尽管使用了循环收集器，但对于每次完成它们时要调用的对象定义一个明确的close（）方法仍然是一个好主意。 close（）方法可以删除引用子对象的属性。不要直接调用__del __（）, 应该调用close（），close（）应该确保它可以针对同一个对象多次调用。
避免循环引用的另一种方法是使用weakref模块，它允许您在不增加引用计数的情况下指向对象。例如，树数据结构应该为它们的父代和兄弟引用使用弱引用（如果它们需要它们！）。
最后，如果__del __（）方法引发异常，则会向sys.stderr输出警告消息。

Why does the result of id() appear to be not unique?

How do I find the current module name?

模块可以通过查看预定义的全局变量__name__找出它自己的模块名称。如果这个值为'__main__'，
程序将作为脚本运行。通常通过导入它们的许多模块还提供命令行界面或自检，
并且只在检查__name__后执行此代码：

How can I have modules that mutually import each other?

没写, 没心情了

Suppose you have the following modules:

# foo.py:
from bar import bar_var
foo_var = 1

# bar.py:
from foo import foo_var
bar_var = 2

The problem is that the interpreter will perform the following steps:

main imports foo
Empty globals for foo are created
foo is compiled and starts executing
foo imports bar
Empty globals for bar are created
bar is compiled and starts executing
bar imports foo (which is a no-op since there already is a module named foo)
bar.foo_var = foo.foo_var

The last step fails, because Python isn’t done with interpreting foo yet and the global symbol dictionary for foo is still empty.

The same thing happens when you use import foo, and then try to access foo.foo_var in global code.

There are (at least) three possible workarounds for this problem.

Guido van Rossum recommends avoiding all uses of from import …, and placing all code inside functions. Initializations of global variables and class variables should use constants or built-in functions only. This means everything from an imported module is referenced as ..

Jim Roskind suggests performing steps in the following order in each module:

exports (globals, functions, and classes that don’t need imported base classes)
import statements
active code (including globals that are initialized from imported values).
van Rossum doesn’t like this approach much because the imports appear in a strange place, but it does work.

Matthias Urlichs recommends restructuring your code so that the recursive import is not necessary in the first place.

These solutions are not mutually exclusive.

`import(‘x.y.z’)` returns `<module ‘x’>`; how do I get z?

Consider using the convenience function import_module() from importlib instead:

1	z = importlib.import_module('x.y.z')

When I edit an imported module and reimport it, the changes don’t show up. Why does this happen?¶

For reasons of efficiency as well as consistency, Python only reads the module file on the first time a module is imported. If it didn’t, in a program consisting of many modules where each one imports the same basic module, the basic module would be parsed and re-parsed many times. To force re-reading of a changed module, do this:

1
2
3

import importlib
import modname
importlib.reload(modname)

Warning: this technique is not 100% fool-proof. In particular, modules containing statements like

1	from modname import some_objects

will continue to work with the old version of the imported objects. If the module contains class definitions, existing class instances will not be updated to use the new class definition. This can result in the following paradoxical behaviour:

>>> import importlib
>>> import cls
>>> c = cls.C()                # Create an instance of C
>>> importlib.reload(cls)
<module 'cls' from 'cls.py'>
>>> isinstance(c, cls.C)       # isinstance is false?!?
False
#The nature of the problem is made clear if you print out the “identity” of the class objects:
>>> hex(id(c.__class__))
'0x7352a0'
>>> hex(id(cls.C))
'0x4198d0'