Hi all,
Just a quick question i came up against starting to look through Quantopian and I was wondering where i was going wrong.
I am trying to return the top 500 companies by market cap, having excluded all non primary share companies.
Now , i am aware there might be a better way to do this involving a more advanced screen filter than what i've used, however i'd still like to know where the problem i'm facing arises from.
In essence what i'm doing is running a very simple pipeline, screening to include only primary shares.
After that, i run a sort by on my market cap column and then Iloc[:500,:] to slice my dataframe to just the top 500 assets by Mcap.
The odd thing is that from there, when i try and run df.index.levels[1] (to get a list of the Sid which i use to create a name column in my frame) the index return is that of the original DataFrame, not the sliced new one. I've tried creating a copy by as the dataframe is considered a function, but also to no avail
Optionally, here are my imports :
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
#Pipeline is the screening engine
from quantopian.pipeline import Pipeline
#Importing Datasets
from quantopian.pipeline.data import USEquityPricing as P_us
from quantopian.pipeline.domain import US_EQUITIES
from quantopian.pipeline.filters import fundamentals as Q_FF
# PARAMETERS TO
from quantopian.pipeline.domain import BE_EQUITIES
# from quantopian.pipeline.data import EquityPricing as P_int
from quantopian.pipeline.filters import QTradableStocksUS
#Import Quantopian Functions
import quantopian.pipeline.factors as QT_factors
# Run pipeline is the Screen Iterator function
from quantopian.research import run_pipeline
# Import Alphalens , Quantopian's Factor analysis & plotting module
import alphalens as al
Final= (run_pipeline(
Pipeline(
columns={"market_Cap":QT_factors.MarketCap(),
"return":QT_factors.DailyReturns()},
screen=Q_FF.IsPrimaryShare()
),
start_date='2019-01-04',
end_date='2019-01-04')
).sort_values(by='market_Cap',ascending=False).iloc[:500,:]
From there if you run Index you get a len of 500 , which is correct, but try running index.levels and you get 4200+
len(Final.index)
len(Final.index.levels[1])
#try a copy ,same problem
df=Final.copy()
len(df.index.levels[1])
Any idea if this is a bug or a mistake in how i am approaching this ?
Also if there is a version of pipeline's screen parameter that allows me to do this, i've be very grateful to be guided to the doc, but as i have not looked much into it myself, i'm asking for help on the above specific issue , anything further is a bonus !
Many thanks, let me know if you need any further information to replicate.
Best,