grayson Posted August 27, 2023 Share Posted August 27, 2023 I don't know if anyone knows this, but I have a little bit of knowlege in python and html. I was wondering if I could find a way to access a gallery of pictures with just one word describing them and with links. I need something like tihe "requests" python API but it puts the name into its database. I will than use numpy to store the pictures through links and convert them with this tool. Okay, now to the part Where You get to know what I am actually doing. I am creating a reverse stable diffusion software that turns images (whether they are ai generated or not) into prompts. Now, I am going through that proccess, and than I will use stable diffusion to merge the picture with a randomly generated image of every single word in the dictionary (other than the inapropriate stuff) than, based on how much it matches, based on percentage, it will add that word into the library of words that it will use. After it finds enough matches, It will give you those words, and with a grammar api, It will make it make sense and not something like "boy, pig, fish" you get what I am saying. I just need some help to finifsh this. Link to comment Share on other sites More sharing options...
Sensei Posted August 27, 2023 Share Posted August 27, 2023 What kind of help do you need? How to retrieve HTML from a specific URL? Try this: https://www.tutorialspoint.com/downloading-files-from-web-using-python How to analyze HTML and extract links from it? You need to find the src img tags. string.index(), string.find() or regex can be used to do it. https://www.w3schools.com/python/python_regex.asp 5 hours ago, grayson said: I was wondering if I could find a way to access a gallery of pictures with just one word describing them and with links. 'Dictionary' contains key-value pairs. https://docs.python.org/3/tutorial/datastructures.html#dictionaries You can make abstract datatype in custom class with 'keyword', 'url' and 'path' (on local storage), to have more 'values'. Link to comment Share on other sites More sharing options...
grayson Posted August 27, 2023 Author Share Posted August 27, 2023 2 hours ago, Sensei said: What kind of help do you need? How to retrieve HTML from a specific URL? Try this: https://www.tutorialspoint.com/downloading-files-from-web-using-python How to analyze HTML and extract links from it? You need to find the src img tags. string.index(), string.find() or regex can be used to do it. https://www.w3schools.com/python/python_regex.asp 'Dictionary' contains key-value pairs. https://docs.python.org/3/tutorial/datastructures.html#dictionaries You can make abstract datatype in custom class with 'keyword', 'url' and 'path' (on local storage), to have more 'values'. I just need a stable diffusion tutorial Link to comment Share on other sites More sharing options...
grayson Posted August 27, 2023 Author Share Posted August 27, 2023 and how to fix this: Traceback (most recent call last): File "my directory", line 18, in <module> image = Image.open(img_url) ^^^^^^^^^^^^^^^^^^^ File "my directory", line 3218, in open fp = builtins.open(filename, "rb") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: [Errno 22] Invalid argument: my directory of course, I cant show my actual directories Link to comment Share on other sites More sharing options...
Sensei Posted August 27, 2023 Share Posted August 27, 2023 (edited) 20 minutes ago, grayson said: and how to fix this: Traceback (most recent call last): File "my directory", line 18, in <module> image = Image.open(img_url) ^^^^^^^^^^^^^^^^^^^ Try this: import requests from PIL import Image # python2.x, use this instead # from StringIO import StringIO # for python3.x, from io import StringIO r = requests.get('https://example.com/image.jpg') i = Image.open(StringIO(r.content)) or from PIL import Image import requests img = Image.open(requests.get('http://example.com/image.jpg', stream = True).raw) img.save('image.jpg') Edited August 27, 2023 by Sensei Link to comment Share on other sites More sharing options...
grayson Posted August 27, 2023 Author Share Posted August 27, 2023 5 minutes ago, Sensei said: Try this: import requests from PIL import Image # python2.x, use this instead # from StringIO import StringIO # for python3.x, from io import StringIO r = requests.get('https://example.com/image.jpg') i = Image.open(StringIO(r.content)) or from PIL import Image import requests img = Image.open(requests.get('http://example.com/image.jpg', stream = True).raw) img.save('image.jpg') Well, I am also using beautifull soup. I will show you the code so far: import requests from bs4 import BeautifulSoup import numpy as np import os import openai import nltk from nltk.tokenize import word_tokenize from nltk.corpus import stopwords from nltk.stem import PorterStemmer from PIL import Image print('type your img url in caps!') img_url = input() library = requests.get("https://pixabay.com/") soup = BeautifulSoup(library.content, 'html.parser') the_good_stuff = soup.content Junk = soup.find_all('img') image = Image.open(img_url) image_metadata = Junk.info print(image_metadata) Link to comment Share on other sites More sharing options...
Sensei Posted August 27, 2023 Share Posted August 27, 2023 (edited) Image.open() does not take URL (i.e. Internet location) but path to local file, or file object (stream). https://pillow.readthedocs.io/en/latest/reference/Image.html requests.get() downloads image from URL, and gives Image.open() local path, or stream. https://www.geeksforgeeks.org/how-to-open-an-image-from-the-url-in-pil/ Edited August 27, 2023 by Sensei Link to comment Share on other sites More sharing options...
grayson Posted August 27, 2023 Author Share Posted August 27, 2023 Okay, well I need a way to simultaniously research every word in the dictionary. And If I am using pytorch, Can I just put the url in? how does it train off the images (I will e using pixabay) Link to comment Share on other sites More sharing options...
Sensei Posted August 27, 2023 Share Posted August 27, 2023 In the above code you showed, the image_url is taken from the user input, but you must use the content returned by BeautifulSoup. https://www.google.com/search?q=scraping+images+python+beautifulsoup Tutorial from the first link: import requests from bs4 import BeautifulSoup def getdata(url): r = requests.get(url) return r.text htmldata = getdata("https://www.geeksforgeeks.org/") soup = BeautifulSoup(htmldata, 'html.parser') for item in soup.find_all('img'): print(item['src']) The src in the above is a relative or absolute URL. You need to convert it to an absolute URL and use it in requests.get() (or alternatives), then output from it in Image.open() to retrieve the image. Then use image object where you need to. 1 Link to comment Share on other sites More sharing options...
Ghideon Posted August 27, 2023 Share Posted August 27, 2023 (edited) 11 hours ago, grayson said: I am creating a reverse stable diffusion software that turns images (whether they are ai generated or not) into prompts. Just curious, are you creating something like CLIP interrogator? (The CLIP Interrogator is a tool to optimize text prompts to match a given image) 4 hours ago, grayson said: I just need a stable diffusion tutorial With some more understanding of your goals I may be able to share some tips on this Edited August 27, 2023 by Ghideon Link to comment Share on other sites More sharing options...
Sensei Posted August 27, 2023 Share Posted August 27, 2023 1 hour ago, Sensei said: Tutorial from the first link: import requests from bs4 import BeautifulSoup def getdata(url): r = requests.get(url) return r.text htmldata = getdata("https://www.geeksforgeeks.org/") soup = BeautifulSoup(htmldata, 'html.parser') for item in soup.find_all('img'): print(item['src']) I tried this code on my website, which I knew had relative URLs, and confirmed. item['src'] is relative. In fact, the conversion from relative to absolute URLs can be done using the requests module itself: for item in soup.find_all('img'): src=requests.compat.urljoin(url, item['src']) print(src) ... but there is a possible problem - a web page may have a <base> tag to replace the original URL.... A rarely used thing these days. https://www.w3schools.com/tags/tag_base.asp ( I found a serious mistake in Python requests module - it does not accept URL to local file either in local directory nor with file:// .. it will be harder to debug the code.. ) Link to comment Share on other sites More sharing options...
grayson Posted August 27, 2023 Author Share Posted August 27, 2023 41 minutes ago, Ghideon said: Just curious, are you creating something like CLIP interrogator? (The CLIP Interrogator is a tool to optimize text prompts to match a given image) With some more understanding of your goals I may be able to share some tips on this Yh, kinda like that. But it is designed to put conjunctions and stuff after the tag. It finds the most optimal keywords to get similar results with. Link to comment Share on other sites More sharing options...
grayson Posted August 27, 2023 Author Share Posted August 27, 2023 from requests from bs4 import BeautifulSoup Goofy ahh invalid syntax 😑 Link to comment Share on other sites More sharing options...
Sensei Posted August 27, 2023 Share Posted August 27, 2023 11 minutes ago, grayson said: from requests from bs4 import BeautifulSoup Goofy ahh invalid syntax 😑 It should be: import requests Link to comment Share on other sites More sharing options...
grayson Posted August 27, 2023 Author Share Posted August 27, 2023 oh okay. Anyways, all I need to know is how to use beautiful soup and tensorflow together to sort out words. I also need a database with every (non-innapopriate) word in the dictionary. All it needs to do is let you be able to define every word at once. If you need to know why, just read the main post Link to comment Share on other sites More sharing options...
grayson Posted August 27, 2023 Author Share Posted August 27, 2023 import requests import bs4 as BeautifulSoup import tensorflow as ts import numpy as np import matplotlib as plt from keras.models import Sequential from keras.layers import Dense webiste = requests.get('https://pixabay.com/') soup = BeautifulSoup('img', "html.parser") percentage =(75) def data_dictionary ( aardvark = soup.find_all('aardvark'), abacus = soup.find_all('abacus'), abalone = soup.find_all('abalone'), ablaze = soup.find_all('ablaze'), a_bomb = soup.find_all('atomic bomb'), abomination = soup.find_all('abomination'), abstract = soup.find_all('abstract'), acid = soup.find_all('acid'), acorn = soup.find_all('acorn'), acoustic_guitar = soup.find_all('acoustic guitar'), acrobat = soup.find_all('acrobat'), actor = soup.find_all('actor') ): model = Sequential() model.add(Dense(units=64, activation='relu', input_dim=8)) # Input layer with 8 input features model.add(Dense(units=32, activation='relu')) # Hidden layer model.add(Dense(units=1, activation='sigmoid')) # Output layer model.compile(optimizer=data_dictionary, loss='binary_crossentropy', metrics=['accuracy']) print("IGNORE THIS MESSAGE") print(percentage) I haven't seen anything but syntax errors in who knows how long also, yes, I am manually typing every word in the dictionary Link to comment Share on other sites More sharing options...
Ghideon Posted August 27, 2023 Share Posted August 27, 2023 56 minutes ago, grayson said: I also need a database with every (non-innapopriate) word in the dictionary. All it needs to do is let you be able to define every word at once. If you need to know why, just read the main post Before digging into technical aspects; don't you need some context to tell what's appropriate and what is the definition? Quick example: nut: usually large hard-shelled seed nut: a small usually square or hexagonal metal block with internal screw thread (yes, there are more homonyms; some of which may be inappropriate depending on context) Link to comment Share on other sites More sharing options...
grayson Posted August 27, 2023 Author Share Posted August 27, 2023 14 minutes ago, Ghideon said: Before digging into technical aspects; don't you need some context to tell what's appropriate and what is the definition? Quick example: nut: usually large hard-shelled seed nut: a small usually square or hexagonal metal block with internal screw thread (yes, there are more homonyms; some of which may be inappropriate depending on context) I just need every English word put into one variable. Than with beautifulsoup I can look it up. Also, can you find why I am having syntax errors? I am relatively new to coding. Not saying I can't pull this off tho Link to comment Share on other sites More sharing options...
Sensei Posted August 27, 2023 Share Posted August 27, 2023 You have syntax errors because you have no idea what you are doing. The interpreter/compiler gives you the line number with the error. Use this knowledge to fix the errors. You need to experiment with less demanding projects to learn how to use all these libraries and features before you move on to an advanced project like this one. 32 minutes ago, grayson said: soup = BeautifulSoup('img', "html.parser") For example, here you must have an error, because what on earth is an 'img'.... ? Link to comment Share on other sites More sharing options...
grayson Posted August 27, 2023 Author Share Posted August 27, 2023 1 minute ago, Sensei said: You have syntax errors because you have no idea what you are doing. The interpreter/compiler gives you the line number with the error. Use this knowledge to fix the errors. You need to experiment with less demanding projects to learn how to use all these libraries and features before you move on to an advanced project like this one. no, I know what I am using. I called a module and It came out with a module error. I dont understand Link to comment Share on other sites More sharing options...
Ghideon Posted August 27, 2023 Share Posted August 27, 2023 1 minute ago, grayson said: I just need every English word put into one variable google suggests: A list with 10000 words, maybe useful as a starting point: https://www.mit.edu/~ecprice/wordlist.10000 A larger list (466k words): https://github.com/dwyl/english-words Notes: -verify licensing before using -"inappropriate" is for you to define and handle -You need a lot more than just English words (se my note above) to get going with your project Link to comment Share on other sites More sharing options...
Sensei Posted August 27, 2023 Share Posted August 27, 2023 Just now, grayson said: no, I know what I am using. I called a module and It came out with a module error. I dont understand You should start by reading the documentation provided by the original author of the library. Then you should find a tutorial on how to use the library in question, with examples. Then you should write your own experimental code to test it in a well-defined and constrained environment. Once you have mastered it, use it in a real project.. Instead you use functions from libraries you just heard about in pretty advanced project. "go for broke".. Link to comment Share on other sites More sharing options...
grayson Posted August 27, 2023 Author Share Posted August 27, 2023 14 minutes ago, Sensei said: You should start by reading the documentation provided by the original author of the library. Then you should find a tutorial on how to use the library in question, with examples. Then you should write your own experimental code to test it in a well-defined and constrained environment. Once you have mastered it, use it in a real project.. Instead you use functions from libraries you just heard about in pretty advanced project. "go for broke".. Here is what it says "module object is not callable" I never wrote a sequence of letters or anything that has to do with 'module' BeautifulSoup("img", "html.parser") Link to comment Share on other sites More sharing options...
Ghideon Posted August 27, 2023 Share Posted August 27, 2023 Another note @grayson: large scale processing of someone else's content may be profited unless you have an explicit permission. Link to comment Share on other sites More sharing options...
grayson Posted August 27, 2023 Author Share Posted August 27, 2023 2 minutes ago, Ghideon said: Another note @grayson: large scale processing of someone else's content may be profited unless you have an explicit permission. Well, I guess I will ask Mit than. (though nobody responds to my emails) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now