@Luca I think the issue here is that you're writing your filter as
mktcap_filter = (mktcap > 1e8 and mktcap <= 1e10)
The correct way to write this expression is:
((mktcap > 1e8) & (mktcap <= 1e10))
Note: The parentheses around the subexpressions are necessary because of the precedence rules for the &
operator.
You almost never want to use the word and
in Python when operating with values that aren't booleans (i.e. True
and False
). The and
operator in Python means "use the thing on the right if it's 'falsey', otherwise use the thing on the right". For example:
This happens to do what you expect/want when the values are booleans:
In [26]: True and False
Out[26]: False
In [27]: False and False
Out[27]: False
In [28]: False and True
Out[28]: False
In [29]: True and True
Out[29]: True
but it has somewhat confusing behavior if you're working with objects other than booleans:
In [24]: [] and 3
Out[24]: []
In [25]: [1] and 3
Out[25]: 3
Filters, like most Python objects, are truthy, which means that the expression (mktcap > 1e8 ) and (mktcap <= 1e10)
evaluates to just (mktcap > 1e8)
, throwing away your other condition.
Some Examples:
In [43]: f = SomeFactor()
In [44]: f
Out[44]: SomeFactor((USEquityPricing.close::float64,), window_length=1)
In [45]: f > 10 # The comparison operators create new "NumExprFilter" objects.
Out[45]: NumExprFilter(expr='x_0 > (10)', bindings={'x_0': SomeFactor((USEquityPricing.close::float64,), window_length=1)})
In [46]: ((f > 10) and (f < 11)) # This is the same as the result above.
Out[46]: NumExprFilter(expr='x_0 < (11)', bindings={'x_0': SomeFactor((USEquityPricing.close::float64,), window_length=1)})
In [47]: ((f > 10) and (f < 11)) is (f < 11) # In fact, it's actually the same object, because the result of `(f < 11)` is cached.
Out[47]: True
In [48]: ((f > 10) & (f < 11)) # Using the & operator yields the expression we actually want
Out[48]: NumExprFilter(expr='(x_0 > (10)) & (x_0 < (11))', bindings={'x_0': SomeFactor((USEquityPricing.close::float64,), window_length=1)})
In [49]: ((f > 10) & (f > 10)) is (f > 10)
Out[49]: False
Advanced Section:
Another interesting way to see the difference between &
and and
is to look at the bytecode that the Python interpreter generates for these expressions:
In [58]: from dis import dis # dis is the disassembly module. It's not available on Quantopian, but you can try this out locally in a shell.
In [59]: dis(compile("(f > 10) and (f < 11) ", "<str>", "eval"))
1 0 LOAD_NAME 0 (f)
3 LOAD_CONST 0 (10)
6 COMPARE_OP 4 (>)
9 JUMP_IF_FALSE_OR_POP 21
12 LOAD_NAME 0 (f)
15 LOAD_CONST 1 (11)
18 COMPARE_OP 0 (<)
>> 21 RETURN_VALUE
This representation of the bytecode is a little confusing at first, but it's actually pretty readable once you get used to it. The first three instructions say "Load the name 'f', then load the constant 10, then compare them with the >
operator, and push the result onto a stack. The compiler then emits a conditional jump instruction JUMP_IF_FALSE_OR_POP
, which says "Jump to the return statement if the left-hand value is falsey, otherwise throw away the left hand value and compute the right hand value". We then either go straight to the return, or we go to the instructions for f < 11
, which then slides straight into the return.
We can compare this to the instructions generated when using &
:
In [60]: dis(compile("(f > 10) & (f < 11) ", "<str>", "eval"))
1 0 LOAD_NAME 0 (f)
3 LOAD_CONST 0 (10)
6 COMPARE_OP 4 (>)
9 LOAD_NAME 0 (f)
12 LOAD_CONST 1 (11)
15 COMPARE_OP 0 (<)
18 BINARY_AND
19 RETURN_VALUE
You'll immediately notice that there are no JUMP
instructions here. We unconditionally evaluate both the left and right hand sides and return the value produced by the BINARY_AND
instruction, which is the opcode corresponding to the &
operator.
End Advanced Section
Hope that helps!
-Scott